Average Values, Exclude Extremes
July 14, 2008 – 4:00 pm by JP
Sometimes I have a list of values and want to average them, while excluding zeros or negative numbers, so I use the following popular array formula. “MyRange” is a named range of cells on a worksheet. I discourage the use of magic numbers, but they are included below for visual reference.
Feel free to use cell references instead.
{=AVERAGE(IF(MyRange>0,MyRange))}
I could also do this without an array, for example:
=SUMIF(MyRange,”>0″)/COUNTIF(MyRange,”>0″)
Which sums all the numbers in the range greater than zero, then divides by a count of those numbers. Another alternative to array formulas is SUMPRODUCT, which I won’t get into here.
All will give me the average of all values in “MyRange” that are greater than zero.
And of course I could specify two different ranges, for example if I had two columns of numbers and wanted to average all of the data elements in one column where the adjacent values are greater than zero. Make sure both ranges are the same size, or you will most likely get an error:
{=AVERAGE(IF(MyRange>0,AnotherRange))}
This all works great when the numbers are relatively close together. But sometimes a range of numbers contains an extreme so far off from the rest that it throws off the whole average, and consequently the average won’t accurately represent my data set. Or maybe I just want to blindly throw out the highest value, to get a better picture of what is going on with the data. For example suppose I have a named range “MyRange” that includes the following numbers.
1,2,3,4,5,6,7,8,9,10,100
(cut and paste this into Excel and delimit by comma if you want to follow along, make sure to select the range and use the Name box to name it “MyRange”)
The formula =AVERAGE(MyRange) produces 14.0909 as the result, which hides the fact that 90% of the numbers are below 10. It includes the number 100, which throws the whole average off by making it appear that most of the numbers are around 14. If I used this assumption in another calculation I would be technically correct, but at best the result might be functionally useless or at worst, cause harm, if it is used in something like a financial calculation.
Ideally I would also use the MEDIAN function to check that my average is correct, but I know I won’t always do that.
If I want to remove the 100, I use the MAX function in the original array formula to compare all the values against the maximum value in the range. If they don’t match (i.e. not 100), they are counted towards the average. In other words, all values EXCEPT the maximum in the range will be averaged.
So instead of
{=AVERAGE(IF(MyRange>0,MyRange))}
I use
{=AVERAGE(IF(MyRange<>MAX(MyRange),MyRange))}
I could also use
{=AVERAGE(IF(MyRange<MAX(MyRange),MyRange))}
Notice how the first one uses “<>” and the second one uses “<". Functionally they are going to do the same thing: exclude the maximum value from the average.
But now by changing the condition of the array from “all values greater than zero” to “all values lower than the maximum” I added zero and negative values back into the array. If you know your range won’t include values <= zero, you can stop here. But if it does, or you aren’t sure, read on.
I need to add another condition to the IF portion of the formula to exclude values <= zero. The way I do it with an array formula is to use the multiplication symbol “*” to add compounding conditions to the evaluation.
{=AVERAGE(IF((MyRange<>MAX(MyRange))*(MyRange>0),MyRange))}
Or
{=AVERAGE(IF((MyRange<MAX(MyRange))*(MyRange>0),MyRange))}
If you aren’t familiar with these types of array formulas, check out the link at the beginning of the post. I use truth tables to help me figure it out when to use ‘+’ and ‘*’ inside an array formula. An excellent explanation of this may be found at this site.
For example, if I wanted to average all numbers in MyRange that are greater than 5 AND less than (i.e. excluding) the highest value, I would use:
{=AVERAGE(IF((MyRange<MAX(MyRange))*(MyRange>5),MyRange))}
But if I wanted to average all numbers in MyRange that are greater than zero OR less than the highest value, I would use:
{=AVERAGE(IF((MyRange<MAX(MyRange))+(MyRange>0),MyRange))}
If I want to eliminate the lowest value instead, I simply substitute MIN for MAX in any of the formulas above! Oh, and I might have to reverse some of the < and > signs as well.
I can also eliminate the lowest or highest N values from our calculation, but it requires that I know what my usual data looks like. Yes it’s cheating, but if I know my data is usually around a certain number, I can simply specify what values to exclude from the average. For example, if my data is usually below 100, and any spikes over 100 would mess up my average and misrepresent my data:
{=AVERAGE(IF((MyRange<100)*(MyRange>0),MyRange))}
Enjoy,
JP
Print This Post
|
Email This Post
|
Permalink
|
Filed Under: Excel, Formulas
Tags: array formula, AVERAGE, exclude, MAX, MIN, SUM
This post has 657 views since July 14, 2008 – 4:00 pm.






