Bonus content: Confidence Intervals

need to frame the example here

A confidence interval tells you where the true value might be

Suppose I narrowed down from the list of employees to get a random sample and then got a high response rate from a representative sample to end up with, say, 100 responses – and calculated that their mean commute is 47 minutes. By the magic of Central Limit Theorem, I know that the actual mean commute for the entire workforce is likely to be quite close to (you guessed it) 47 minutes.  

Statisticians would call “actual mean for the entire workforce” the “population mean” as it saves having to type out the name of the defined group of people every time, so that’s what we’re going to do now.  

Suppose again that the true population mean is 15 minutes. It’s possible that my reasonably well-designed survey could come up with a mean commute from my respondents is 47 minutes by sheer fluke. If I only had 3 responses then of course I might have easily missed nearly every employee with a very short commute, and instead picked at least one of the extremely long ones. But with 100 responses, I’d have to be pretty unlucky to get such an unbalanced response overall. 

Think for a moment about a true population mean of 45 minutes, and compare that with my result of 47 minutes from my 100 responses. They are close, and it seems much more plausible that I could get a result of 47, a little bit out, if my belief that I’ve done a good job of my sampling is correct.  

So some values of the true population mean make it likely that I’d get this sample mean, while other values would make it very unlikely. This brings us to the confidence interval. 

A confidence interval is the range of values for the population mean that could plausibly result in the observed mean

We also have to know something about the defined group of people and the likely spread of their answers. For a commute, we might have these people: 

  • Work from home, zero commute 
  • A reasonable commute, maybe up to an hour 
  • A challenging commute, maybe an hour to three hours 
  • Something else, such as weekly commuting from a distant city or splitting their time between two countries. 

We can’t get a commute less than zero, and arguably if the commuting time is over 24 hours then that’s no longer commuting but something else. But what really matters is the approximate shape of the commutes. I sketched a couple of guesses.  

 image missing here

0.6 People with a small city commute compared to a Bay Area commute 

In the organisation in a small city that doesn’t allow working from home, the commute time has a mean around 45 mins with standard deviation (spread) about 20 mins either side of that – quite narrow. In the Bay Area organisation, the mean commute is around 90 mins, but the standard deviation is much larger: around 75 mins.  

If only we knew the standard deviation of our population! Because if we had it, then the convenient mathematical properties of means and the Normal distribution would let us work out what the confidence interval might be, based on our sample size. 

But hang on a moment: we have got something that we can use as an acceptable estimate of the true standard deviation of the population. It’s the standard deviation of our sample – a representative sample, so it’s probably got a standard deviation that’s not massively different from the real one.   

And the final ingredient we need is our view of what “plausible” ought to be. There’s more on the topic of “likely” and “plausible” in the  Spotlight on Statistical Significance, but for now let’s be boringly conventional and opt for 95% – or a 19 in 20 chance that the range of values our confidence interval contains does indeed have the true population mean within it.  

Confidence level  95% 
Standard deviation  25 
Sample size  100 
Result: Confidence interval  4.9 

You may see the ideas expressed like this: 

The mean commute is 47 minutes, plus or minus  4.9 minutes at 95% confidence. 

Or you might say, rounding to the nearest minute to keep it simple: 

Based on our responses, we estimate that the average commute is between 42 and 52 minutes 

Which is the right choice? Either! It depends on your goals and your stakeholders.