Wednesday, June 27, 2018

The rule of five

I read a really good book last year by Douglas W. Hubbard, called "How to Measure Anything."  I can't remember why exactly I selected the book - in fact, I apparently bought the book via Amazon.  Someone could have recommended it, or maybe I saw it while searching for some other book.  Why I read the book is probably not important.  What's important is the fact that it is a really interesting book that is well worth the time, energy, and effort.  And, since process improvement is an important aspect of leadership (particularly in the health care industry), and measurement is crucial to understanding process improvement, I thought I would talk about a couple of the book's take-home points.

Hubbard talks about something he calls the rule of five.  The rule states that "there is a 93.75% chance that the median of a population is between the smallest and largest values in any random sample of five from that population."  It doesn't matter whether the population is 100 or 1,000 or even 10,000, the median of a sample of a five will contain the median of the entire population.  When I first read Hubbard's statement, I didn't believe it.  I have taken a number of statistics classes over the years, and I have been involved with a number of clinical trials over the years as a clinical investigator.  I was always taught that the number of study subjects in a clinical trial was critically important, especially when extrapolating the results from the study to that of the general population.  I can assure you that I was never taught that 5 study subjects was anything close to being sufficient for a clinical trial!

So, what is going on here?  I think the important point to remember is that in many cases, a reasonable approximation of the population's median (of whatever is being measured) is probably close enough for making decisions about process improvement.  Say, for example (and I will use Hubbard's example here) that a company wants to determine whether it makes sense to consider more telecommuting options for its employees.  An important thing to know would be the time required for the company's employees to commute (i.e., how long does it take the employees to drive to work?).  A survey of all the employees (in this case, there are 10,000 employees in the company) could determine the median commute time, but such a survey could be costly and time-consuming.  Conversely, selecting a sample of five employees and measuring the commute time for each employee in the sample could provide a reasonable approximation of the median commute time for the entire company.  According to the rule of five, there is a 93.75% chance that the median commute time of the company is between the smallest and largest values of the sample of five employees.

Still not convinced?  I wasn't either.  Here is the mathematical proof.  Recall that the median is the value (in this case, commute time) that is exactly in the middle of all of the values (i.e., commute times for every employee in the company).  Half of the values are above the median, while half of the values are below the median.  Therefore, if we randomly selected five values, what is the likelihood that the median value for the entire population is within the range of the five values in our smaller sample?  Given that there is a 50% chance (by definition) for each value in the sample to either be above or below the population median, we can calculate the chance that the population median is within the range of the sample.  There is a 0.5 x 0.5 x 0.5 x 0.5 x 0.5 probability that all of the values in the sample are above the median, or 3.125%.  There is also a 0.5 x 0.5 x 0.5 x 0.5 x 0.5 probability that all of the values in the sample are below the median, or 3.125%.  It should be intuitive, then, that there is a 3.125% + 3.125% chance the five values are either not above or not below the population median.  In other words, the chance of at least one value out of the sample of five being above the population median and at least one being below is 93.75%. 

Make sense now?  I had to think about this for a little bit too.  It reminds me a lot of the famous "birthday problem" or "birthday paradox".  Check it out.

No comments:

Post a Comment