Average is one of those statistics that comes up a lot. What does it mean? How can we use it? What are its limitations? Today we’re going to talk about both the average, also known as the mean, and another statistic called the median. Means and medians are both ways to find out what’s typical and to compare multiple things.
It’s easier to understand what these two statistics tell us when you know how they are calculated. Don’t worry if you don’t think of yourself as a “math person.” We’re only going to use addition and division.
What is it?
Let’s do the basic math of how a mean is calculated using an example. I have a storytime at my library, and I want to know the mean age of the children attending today. With their caregivers’ help, I find out the ages of the five children who are there: 3, 3, 4, 2, 3.
So, to calculate the mean age:
3+3+4+2+3 = 15 ← add up all the ages to make a total
15/5 = 3 ← divide the total by the number of children
The mean age is: 3
Why is it used?
The mean tells us what is typical for a group of values. It’s useful to know what a typical value is because it can help you compare multiple groups of values. Let’s say you think that one of your regular storytimes has younger children than another. You find out participants’ ages at your storytimes on Tuesdays and on Saturdays. After doing this for many months, you see a pattern: usually the typical age of participants on Tuesdays is three, but the typical age on Saturday is five. Because you have these data, you decide to start planning slightly different activities for the Tuesday and Saturday storytimes. Useful, right?
What can go wrong?
Means, like all statistics, have pros and cons. Outliers are unusual pieces of data that can really change the mean. Let’s say someone’s older sibling comes to storytime that same day we calculated the mean for already. So now our data are: 3, 3, 4, 2, 3, 9.
3+3+4+2+3 + 9 = 24 ← add up all the ages to make a total
24/6 = 4 ← divide the total by the number of children
The mean age is: 4
Four! Add one nine year old sibling, and the mean jumps all the way to four. Should you change the storytime to be more geared toward four year olds because this nine year old came once? No, probably not.
Enter the median. The median is another way of calculating a typical value, and is less impacted by an outlier. The median is the middle value in a data set. Or to put it another way, half of the values in the data set are higher than the median and half are lower.
To calculate the median, put the data in order from lowest to highest, and identify the middle value. Here are our data in order: 2, 3, 3, 3, 4, 9.
In this case, we have an even number of data points, so three and three are both middle values. If we had an odd number of data points, the middle value would be the median–end of calculation. When you have two middle values, you get to bring in your old friend mean to help figure out the median:
3+3 = 6 ← add up the two middle ages to make a total
6/2 = 3 ← divide the total by the number of middle ages
Surprise, surprise. Three is our median! This is why the median can be so helpful. When there are outliers that will change the mean, the median is not impacted as much and is a more accurate indicator of what’s typical.
Cool math lesson, now what?
Means and medians are both ways to find out what’s typical and to compare multiple things. Here are some examples of how this comes up in everyday life.
What’s typical? Why would we want to know?
- What is the mean temperature in Colorado in May?
- Should I keep a sweater out?
- What’s the mean value of this car I might buy?
Are these two things similar or different? Why would we want to know?
- What is the mean salary for library staff in one state compared to another state?
- Maybe there’s a place that pays similar where it doesn’t snow in the spring?
- What was the mean ebook circulation in public libraries in 2019 compared to 2009?
- Is ebook circulation increasing, decreasing or staying the same?
The mean and median are a good place to start investigating a question to orient yourself. The key to using means and medians well is to not stop with them. They both indicate what is typical, but not the whole picture. It is important to check, like we discussed before, that what is being compared is actually comparable. The mean doesn’t necessarily take other variables into consideration. For example, comparing the mean salary for library staff in two different states doesn’t take into account the cost of living. The same salary could result in very different qualities of life in two different places. We’ll talk more about the importance of other variables soon.
Any statistic is tied to the underlying data
Keep in mind that the accuracy of statistics depends on the quality of the dataset that statistic is about. How the data were collected, how much data were collected, and to what extent the data represent the subject all impact the quality. For example, if you collected the data by guessing children’s ages instead of asking, we don’t know if the mean is accurate because we don’t know if the underlying data is correct.
Even with accurate data, there are limits on the conclusions you can draw. In our example, the data we were collecting about the age of storytime participants would be helpful to your specific library, but you can’t conclude that the mean age of all participants in all Tuesday storytimes everywhere is three. We didn’t collect those data. We have no idea if your storytime on Tuesday is like other libraries’ Tuesday storytimes.
Numbers can’t do the thinking for you
Life is unpredictable and messy. You may have storytimes for months where the mean age is three, and then one week a bunch of two year olds come. Your mean will change then, and you have to decide what to do with that information. Do you want to adjust the storytime? Do you find that there’s a key developmental change between ages two and three and you’d like to market some storytimes for children two and younger? The statistics can help guide your decision, but they will never tell you what to do. You have to decide how to use the statistics and the other information you have to understand what’s happening and what you want to do.
The tip of the statistics iceberg
For the mean and median to be good measures of what’s typical, the dataset needs to meet some criteria. Those criteria get into probability and what the dataset looks like when it’s arranged a certain way (its distribution). For our purposes here, you don’t need to be deeply familiar with those concepts. If, however, you want to learn more, you could start here.
LRS’s Between a Graph and a Hard Place blog series provides strategies for looking at data with a critical eye. Every week we’ll cover a different topic. You can use these strategies with any kind of data, so while the series may be inspired by the many COVID-19 statistics being reported, the examples we’ll share will focus on other topics. To receive posts via email, please complete this form.