Regression Model Improvements

In many cases the Regression model can be improved by adding or removing factors and interactions from the Analysis Array. The decision to retain or remove a parameter from the model is usually guided by the level of significance of that factor in the regression, or its position on a Normal or Half Normal Probability Chart.

There are two levels on which an analysis may be reviewed within SPC software. One level is to review the overall statistics that apply to the results. If there are many parameters in the regression, the coefficients of determination and the F-ratio probabilities should be high. If the values are not above 0.8, there may be a problem with the model or there may be too few runs to adequately support the estimate of the active parameters. Higher values of determination and probability are often found for processes that reflect simple physical relationships; lower values may result from complex interactions. Lower values also may result from a linear analysis done on a nonlinear situation or may be an indication that the response or the factors need to be transformed.

Secondly, if the overall coefficients and probabilities are acceptable, review the level of significance of the individual fitted coefficients (ÃŽÂ²-values of the model). Those which are clearly low (less than 0.5) should be removed one or a few at a time, starting with interactions and then including factors. When the remaining factors and interactions have levels of significance greater than 0.75. review the statistics again. Another way of evaluating the importance of a parameter is to examine its position on a half-normal probability plot; terms near the origin are less significant than those far away.

When there are few data points, as is common in a grouped data analysis, it is often difficult to judge whether or not to retain a parameter. The number of groups is fewer than the number of individual observations, so there are fewer degrees of freedom available for parameter estimation. Factors which were important when a regression was calculated from individual runs are probably still important to the grouped data regression. However, there may be factors which contribute to the variation of the response more strongly than they contribute to the response itself. Generally far fewer interactions are estimable in a grouped data analysis.

Identified and unwanted outliers should be deleted from the Array prior to regression. Factors and interactions may be removed from the model by deleting the applicable interaction. Interactions may also be added if they are estimable with the experimental data. The current interaction list and experimental array are audited for confounding before starting the regression calculations.

Many practitioners believe that if main quadratic or interactions terms that include a factor are significant and are retained, then the linear term of that factor should also be retained (effect heredity). Other practitioners who have prior knowledge of the underlying physical model might retain only significant terms and those justified by the physical model.

An essential point is that the order of qualitative factor-levels is arbitrary and that a regression incorporating qualitative factors has no meaning except at the discrete qualitative levels.

There are at least three schools of thought regarding a regression analysis. One school regards the model as absolute, embodying a preconception of the physical laws governing the experiment. Another school regards the least-square fit as the best that can be done for the data (transformed or not). They may add terms or replace a term with another so as to improve the fit, but without much consideration of a reduction in the number of terms. The third school seeks the essential few. Their objective is to eliminate as many parameters as possible whose coefficients are not significantly different from zero. The latter two schools regard iteration on the regression to be essential to a good analysis. See also: Regression by Backwards Elimination

The task is to decide on the essential parameters for the regression. Some practitioners believe that all factors should be normalized when making this evaluation Normalization is less important when qualitative factors are involved. After the model is expanded or reduced to meet the criteria, normalizations are dropped and transforms may be considered. Other practitioners, more sure of their model, will do transforms first or may normalize the transformed data.

Learn more about the Regression tools in Six Sigma Demystified (2011, McGraw-Hill) by Paul Keller, in his online Regression short course (only $99), or his online Black Belt certification training course ($875).