Defining Control Limits

In order to define the control limits, we need:

an ample history of the process to define the level of common cause variation, and
a basis for determining how wide to set the control limits.

How many subgroups are necessary to define a process? There are two issues to be resolved. The first issue concerns the process. In order to distinguish between "special causes" and "common causes", you must have enough subgroups to define the "common cause" operating level of your process. This implies that all types of common causes must be included in the data. The second issue deals with statistics. The statistical constants used to define control chart limits (such as d2 or c4) are actually variables, and only approach constants when the number of subgroups is "large". For a subgroup size of five, for instance, the d2 value approaches a constant at about twenty-five subgroups (Duncan, 1986). When a limited number of subgroups are available, use Short Run Techniques.

To define the expected limits for a given set of process data, we can either attempt to characterize the distribution , assume Normality, or assume that the distribution makes little difference. There are several techniques for fitting distributions to data, which are discussed in Curve Fitting . For the X-bar Charts, there is sound statistical rationale for assuming Normality of the plotted subgroup averages. The Central Limit Theorem holds that, regardless of the underlying distribution of the observations, the distribution of the average of large samples will be approximately Normal. Research using computer simulations has verified this, demonstrating that the Normal Distribution will provide for a good approximation to subgroup averages and that large subgroups may be as small as four or five observations, so long as the underlying distribution is not very skewed or bounded.

There is some contention within the Quality community that the distribution of both the underlying process and the subgroup averages is irrelevant to the understanding and use of control charts. The debate itself might be viewed as rather esoteric, since both sides would draw similar broad conclusions: the control chart, particularly the X-bar chart, is a useful tool for detecting shifts in a process. The pertinence of the debate, however, is in the details, and has particular impact when applied to other control charts, including the Individual-X chart and the more recently developed CuSum and EWMA charts.

The argument against the use of probability models to define the control limits includes the following remarks:

Shewhart did not rely upon the Normal Distribution in his development of the control chart; instead, he used empirical (experimental) data, and generated limits that worked for his process.
Since the control chart is not based on a distinct probability model, it is not necessary to fit a distribution or make any assumptions about the process or its data. The control limits that are calculated using the Shewhart equations will always provide control limits that are robust to any differences in the underlying distribution of the process.
If you say that the X-bar chart relies upon the Normal Distribution, you rely upon the Central Limit Theorem. But the Central Limit Theorem would not apply to the subgroup range or sigma calculation anyway, so how do you define limits for the subgroup ranges (or sigma)?
The control limits are set in the "tail areas" of the distribution anyway, so that any attempt to fit a distribution will be subject to errors in these regions.

The argument for the use of probability models to define the control limits notes the following:

1. If control charts defined by Shewhart were based entirely on empirical data, and not based on any theory that would have broader implications for all processes, they would be useful for only Shewhart-type processes. This is not the case; the control charts are based upon mathematical (or more precisely, statistical) theory that transcends particular processes.

2. The control limits are determined mathematically, and the formula used for computation is a direct application of Normal probability theory. Although this mathematical model could be based on empirical evidence only, it is not coincidence that the model perfectly applies to Normally distributed statistics, and applies much less so as the statistic looks less Normal. Consider how to estimate the control limits on an X-Bar chart:

Two parameters are calculated: the overall average and the average within subgroup standard deviation. Neither of these calculations demands that the observations be Normally distributed; however, the Normal Distribution is the only distribution perfectly described by only these two parameters.

One parameter is tabulated: the factor (either d2 or c4) used to convert the average within subgroup variation to the expected variation of the process observations, based on the subgroup size. The estimates of the d2 or c4 factors are derived based upon the assumption of Normality of the observations.

One parameters is defined: the number of standard deviations at which to place the control limits (usually 3). The placement of the control limits at plus and minus 3 standard deviations from the center line is appropriate only for a Normal distribution, or distributions whose shape is similar to a Normal Distribution. Other distributions may respond to this signal significantly more frequently even though the process has not changed or significantly less frequently when the process has changed. Given the intent of a control chart to minimize false alarms, this is not desirable. See Tampering.

The Western Electric Run Tests, in fact, make use of the probability models to determine when the pattern of groups in the control chart are non-random. Without knowing that the subgroup averages should be Normally distributed on the X-bar chart, you could not apply the Western Electric Run Tests; they would have no meaning without an understanding of the probability model that is their basis.

Similarly, the argument against using 2-sigma limits due to their impact on tampering would have little meaning without an understanding of the underlying distribution of the plotted subgroups. See Tampering .

3. It is true that the Central Limit Theorem does not apply to the subgroup range or sigma statistics. But what does that prove? Perhaps that the distribution of the Range or Sigma is not sensitive to the assumption of Normality of the observations? That's been shown to be the case in prior academic studies.

4. Curve fitting to define Distributions, like any modeling technique, is subject to error, and statistical error is likely to be higher where there is less data, such as in tail regions of distributions. But there are techniques for dealing with this situation. See also Curve Fitting .

What are the implications of this debate?

1. If we use the X-bar chart, little. Both sides agree that the X-bar chart is a very useful tool, they just disagree why it is useful. As mentioned above, there would also be a question as to the validity of Run Tests in the absence of the probability model.

2. If we use the Individual-X chart, or try to estimate process capability, we must either assume that the distribution does not matter, or fit a distribution. We can easily compare a fitted curve to the Shewhart calculations to see which best describes the process behavior. Note that the Shewhart calculations exactly coincide with the calculations for the Normal distribution, as pointed out above. See Curve Fitting .

3. The Moving Average, EWMA and CuSum control charts may have a couple of interesting uses, depending on your point of view:

When we are forced to use subgroups of size one due to Rational Subgroup considerations, these charts do not require that we fit a distribution to the data. Instead, they plot averages (moving averages, exponentially-weighted moving averages, or cumulative sums), which allows the use of Normal control limits via the Central Limit Theorem.

A mathematical understanding of the these statistics reveal that their control charts can be designed to be more sensitive to small process shifts. This knowledge would be useful for detecting small process shifts (shifts of approximately.5 to 1.5 sigma units) that would otherwise be lumped into "common cause variation" using the standard control limits. Note that this sensitivity is gained without an increase in false alarms (See Tampering ). Those who do not believe in the distribution as the basis for the control limits should not accept the argument that these charts are more sensitive, or even that these charts have any valid uses. Instead, they should contend that the charts promote tampering, since they respond to special causes not detected through the standard Shewhart calculations.

Learn more about the SPC principles and tools for process improvement in Statistical Process Control Demystified (2011, McGraw-Hill) by Paul Keller, in his online SPC Concepts short course (only $39), or his online SPC certification course ($350) or online Green Belt certification course ($499).