From the course: Complete Guide to Excel Statistics with Copilot
Means, medians, and modes
From the course: Complete Guide to Excel Statistics with Copilot
Means, medians, and modes
- [Instructor] Whether our dataset is big or small, people typically want to find the middle of the dataset. And even though you know the largest and smallest data point in your dataset, I wonder, is the middle of the dataset exactly in between those data points? Well, there are two functions often used to identify the middle of the dataset. The first is the mean, also known as the average. The average is the sum of all the data points, in this case, it's 300, divided by the total number of data points. Since we already discovered the count function and the sum function, we have these values. So if we divide the sum by our count, we'll see that the average is 60. An easier way to do this is to use the average function inside of Excel. So let's go over here and say, "Well, let's type in, 'Average.'" And there it is. Click on this, and I want to take the average of these numbers right here. You see, we got the same number. Now, the other way to figure out what the middle of our dataset is to take all of our data points and to list them from the biggest data point down to our smallest data point, which is what we see right here. And we then find the middle data point among the five. So five, the middle data point is the third one, so that's what we call our median. Again, we can use the function in Excel, median. There it is. Find the median of these five numbers right here. And just as we expected, it's 60. By the way, one other function to help us understand our dataset is the mode. The mode is the most popular number in our dataset. So let's use this function right here. Now, first thing we're going to do is type in, "Mode." And you'll notice there is mode.malt and mode.single. What we want to do is we want to find out whether there is not just one popular number, but more than one. In this case, what you'll notice is our dataset doesn't have any number repeated, 80, 70, 60, 50, 40. So we're not expecting there to be any mode at all. Nonetheless, I'm going to type in, "Mode.malt." I want to see if there is a mode within these numbers. Now, we know there's not, so we should get this right here, the #N/A. Let's go ahead and do this for other datasets. And what I can do is, because I know I'm looking at a very similar setup here, I'm going to copy these formulas. So Control C, and I can do Control V, which is a paste, Control V. And so what we just found is, the average of these numbers right here is 62. The median, the third data point from biggest to smallest, is 70. 70 also happens to be the mode because it's the most popular number among our five data points. Once again, we can copy this, bring it down here. Once again, our average in this case is 60. Our median, the middle data point when we list them from largest to smallest, is 70. So that looks good as well. And notice here we have two modes, because 70 shows up twice and 40 shows up twice. Once again, I'm going to copy my formula, I'm going to paste it down here. And in this case here, our average is 70. Our median, the middle data point, is also 70. And 70 also happens to be our most popular data point, so it's 70. So we just did that with a number of small datasets. Let's see what happens when we start working with a slightly larger dataset. And notice, with this dataset, we also have an even number of data points. So this is going to present us with another thing that we can discuss. So first of all, let's find the average. We take the average of these 10 data points, which is 26.5. Now, the median, what's the middle number here? Well, we actually have two middle numbers, data .5 and data .6. And what they do to find the median when you have even number of data points is you take the average of these two data points. We're here, the fifth and the sixth in this particular case. And that's what we should get here. So we type in, "Median." We want the median of all of these numbers right here. And I'm expecting it to be 25, the average of 30 and 20. That's exactly what happens. So let's see what our mode is. Once again, I'll type in, "Mode." And I want to see if there's multiple modes or even just one. Click Return. And I find out that there are three modes. 35 shows up twice, 20 shows up twice and 15 shows up twice. So we know how this works. Let's look at our hidden datasets. We've worked with some of this before, so I'm going to take the average of my test score column. Going to take the median of my test score column, and I'm going to see what the mode is. Again, mode.malt for this entire column. It doesn't look like there's a mode. So there's only five test scores. The median and the mean seem to be very, very close, and none of the numbers are repeated. Now, what I'm going to do is, because we're always going to be taking our data from column C, I'll be able to copy these formulas from this spreadsheet to the larger class. Copy and paste. And so what we can see right here is the average of the 221 test scores that we have is about 76.7. The median, very similar, is 77, and there is a mode, 72 seems to be the most popular exam score for this particular class. All right, so let's look at this with financial data. And once again, I'm going to copy and paste my formulas because we're looking at data in column C. Now, in this case, notice, there's over 1,000 accounts in this particular dataset. So in some cases, it could take a while for Excel to calculate all of these things. So if it takes a little bit, that's okay. Let's go ahead and paste our formulas in here. And notice, a little bit of time, and there we go. So what we can see here, and let me format this into dollars, the average account has $108,000 in that account, $108,525.56. But notice the median. The median is only $28,790.97. And that's interesting because what it's telling us is that of the 1,012 people that have an account, the person that's about 506, 507 on that list from top to bottom, they're at 28,790. Nonetheless, the mean is much higher. This usually tells us that there are some really big numbers at the top that are pulling up that average. And so the median, the middle person, is way down farther than the person who has the higher numbers. Another interesting thing here is, even though we have 1,012 accounts and we have numbers that have decimals on them, we still ended up with not one mode, but two modes. So it looks like $89,091 and $119,268. They show up more than once in our dataset. In this case here, we're working with column B, so I'm going to have to type in my formulas again. All right, so we have 74 salaries in our dataset. And it looks like the average salary is about $1,485. The median is about $1,418. And what this is telling us is we're really close. I mean, our mean and median are pretty close together. The mean is slightly higher, which means that the numbers above the median, there's probably a few numbers that are bigger at the top than at the bottom. Our mode, we actually have three modes. Employees who are getting paid the same amount of money, 9.32, 13.95 and 10.97. All right, so hopefully now we're getting a good idea of what's in our datasets. We know how many data points we have. We know the total sum of adding up all those data points. We know the biggest, we know the smallest, we know the difference between those. And now we have an idea of our center as well, whether it be the mean, the average of all our data points, the median, the middle data points when we list them from top to bottom, or the mode, the most popular data points in our dataset.
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.