Resampling (Bootstrapping)

The following is an excerpt from The Quality Engineering Handbook by Thomas Pyzdek, © QA Publishing, LLC.

A number of criticisms have been raised regarding the methods used for estimation and hypothesis testing:

· They are not intuitive.

· They are based on strong assumptions (e.g., normality) that are often not met in practice.

· They are difficult to learn and to apply.

· They are error-prone.

In recent years a new method of performing these analyses has been developed. It is known as resampling or bootstrapping. The new methods are conceptually quite simple: using the data from a sample, calculate the statistic of interest repeatedly and examine the distribution of the statistic. For example, say you obtained a sample of n=25 measurements from a lot and you wished to determine a confidence interval on the statistic Cpk. Using resampling, you would tell the computer to select a sample of n=25 from the sample results, compute Cpk, and repeat the process many times, say 10,000 times. You would then determine whatever percentage point value you wished by simply looking at the results. The samples would be taken "with replacement," i.e., a particular value from the original sample might appear several times (or not at all) in a resample.

Resampling has many advantages, especially in the era of easily available, low-cost computer power. Spreadsheets can be programmed to resample and calculate the statistics of interest. Compared with traditional statistical methods, resampling is easier for most people to understand. It works without strong assumptions, and it is simple. Resampling does not impose as much baggage between the engineering problem and the statistical result as conventional methods. It can also be used for more advanced problems, such as modeling, design of experiments, etc.

For a discussion of the theory behind resampling, see Efron (1982). For a presentation of numerous examples using a resampling computer program see Simon (1992).

Learn more about the Statistical Inference tools for understanding statistics in Six Sigma Demystified (2011, McGraw-Hill) by Paul Keller, in his online Intro. to Statistics short course (only $89) or his online Black Belt certification training course ($875).