The right data for the job – part II

Hello, again! Are you ready to learn more about the right data for the job? We are reviewing the  qualifications of various data to answer different kinds of research questions, just like we would review job candidates’ qualifications for a job. Last week we talked about the importance of what data were collected and how they were collected. This week we’re going to consider the importance of definitions and what it means for data to be representative. I know you have been waiting anxiously to figure out what we were going to do with those hats, so let’s jump back in! 

1) Can you define that for me?

In research, definitions matter a lot. How researchers define important concepts impacts both what data are collected and how they are interpreted. For instance, last week we talked about collecting data by looking at something people created – hats in a knitting class – and whether these hats could be defined as a “success.” Those hats can be used as our data, but we need to specify how we are defining success.

So how does one go about measuring if a hat was “successful”? Is a hat successful simply if it is completed? Or, does it need to be round and fit on someone’s head? What if it’s too itchy for any human to wear, but a cat decides it’s an amazing toy? To come to a conclusion about the success of these particular hats, and use that to evaluate the success of the program, researchers need to make decisions about these types of questions and how they relate to the research question. 

As a reader of research, look for a clear connection between the research question, how the concepts being studied are defined, and the conclusions that were drawn. They should all align. When you’re researching a new topic, be aware that there can be wide variety in how a concept is defined in different fields and by different researchers.

2) Representative

Do the data actually represent the thing that is being studied? Let’s say you want to know how many people in your service area read a book last month. You could call every single person to ask, but this is unrealistic because of the resources it would require. An alternative approach is to collect data from a sample of the population. In this scenario, everyone in your service area is the population and your sample is the people you actually collect data from. 

Creating a truly representative sample is difficult because it must meet these l criteria:

  1. Your sample should equal a certain percentage of your population. There are tools, like this one, to easily calculate what your sample size should be.  In general, if your population is smaller than 100, you should be surveying everyone. 
  2. Every member of the population needs to have an equal chance of being included in the study – meaning that the sample is randomly selected. This reduces bias and the potential for certain groups to be over-represented and their opinions magnified while others are under-represented. 

Results from a sample can be generalized to the population if it meets these criteria. 

What if the sample doesn’t meet these criteria? Then, check for another criterion – whether the sample otherwise mirrors the characteristics of the population.

Let’s say your sample size is 250, so you ask the first 250 people who walk into the library if they read a book last month. These data are going to be skewed because not everyone in your service area visits the library and those individuals that don’t haven’t had a chance to participate. Those that walk in also might not be representative of your population. For instance, if 50 percent of your population has a college education, 20 percent are African American, and 10 percent are above the age of 65, your sample should also reflect that.  

When reading research, check to see whether the sample meets the three criteria above. If it doesn’t meet the first two, you can be more confident that the results are still somewhat representative of the population if the demographics of the sample are similar to the population’s.

Getting a representative sample can be challenging, and researchers may acknowledge that some groups were over- or under-represented in their study. That doesn’t mean that research can’t provide valuable information. It does mean that this particular research may not be able to draw accurate conclusions beyond those individuals who participated in the study, or about the groups that were under-represented in their study. Be cautious about research that does not acknowledge or discuss significant differences between the population and their sample. 


You made it! You are ready to go interview some data! Let’s review: research results are based on data, and the quality of those data matters. Do the data collected actually answer the question that was asked? What data were collected, how they were collected, what definitions were used, and whether the data are representative all impact the quality and interpretation of the data. You don’t need to be an expert to consider whether the data used are really answering the research question. Use your common sense and these tips to think critically about the right data for the job.