Minimum Number of Subgroups for Capability Analysis

07/21/2011:

What is the minimum number of data points you would recommend for capability analysis?

Calvin E., Quality Engineer

There is a common misconception that 30 subgroups is sufficient to estimate process sigma. I suspect that number comes from the statistical properties of the “constants” used to estimate process standard deviation given average Range or Sigma from a set of subgroups.

That is, the constants used to define the control limits (A2, D4, D3, etc.) are not really constants, but approach constants for a large number of subgroups. How many subgroups you ask? For subgroups of size 5, you need about 35 subgroups before the second or third decimal place starts becoming constant. For smaller subgroup sizes, you need more. When you get down to subgroup size of 1, you may also need to fit a distribution, especially if you want to analyze process capability. Most statisticians would say you need a few hundred data points to fit a distribution. My general rule of thumb is to use 150 to 200 observations minimum for all subgroup sizes (i.e. that's 30 to 40 subgroups of size 5), and of course that data must be from a stable process for it to be useful for these purposes. Do I ever make charts with less data? Of course, because it is interesting, easy and I may learn something about the process. But I realize the limitations of the data and do not sell the farm based on the limited data.

Bear in mind that these are only the statistical considerations. If we obtained 200 observations over the course of 2 minutes, which is possible from automated data collection equipment, would it be useful for estimating the common cause variation we are likely to experience from the process? Probably not. To properly estimate properties of the process, you need to consider the process dynamics: the system of common causes that underlie the process. Generally, it’s better to collect rational subgroups less frequently over a longer time period. Is one hour sufficient? One day? One week? Consider the causes of process variation (5M and E: Manpower, Materials, Methods, Machines, Measurement and Environment; or 4 P: Policy, Procedure, Plant, People); collect data more frequently to learn about the process; adjust frequency of sampling in response to what you’ve learned about the sources of common and special cause variation. Once statistical control has been established, we can estimate process standard deviation and resulting process capability indices or sigma levels.

Learn more about the SPC principles and tools for process improvement in Statistical Process Control Demystified (2011, McGraw-Hill) by Paul Keller, in his online SPC Concepts short course (only $39), or his online SPC certification course ($350) or online Green Belt certification course ($499).