Tools

Interpreting Scatter Diagrams

The x-axis in scatter diagrams is used to measure the scale of one characteristic (called the independent variable), and the y-axis in scatter diagrams measures the second, (called the dependent variable). If the two characteristics are somehow related, the pattern formed by plotting them in a scatter plot diagram will show clustering in a certain direction and tightness about the line. The more the cluster approaches a line in appearance, the more the two characteristics are likely to be linearly correlated.

SPC Software displays Scatter Diagram with Regression results

The relative correlation of one characteristic to another can be seen both from how closely points cluster the line, and the correlation coefficient in the Statistics window. Values of the correlation coefficient near one imply very high correlation between the characteristics, meaning that a change in one characteristic will be accompanied by change in the other characteristic. Positive correlation means that as one increases, so does the other, and is shown on the scatter plot line diagram as a line with positive slope. Negative correlation implies that as one characteristic increases, the other decreases, and a negative slope is seen on the scatterplot line diagram.

The F statistic is used to verify the significance of the regression and of the lack of fit.

Remember that correlation does not necessarily mean a cause and effect relationship exists. Both the characteristics may be the effect of a number of other causes. All it means is that there appears to be a relationship between the two over the range of the data. Be careful not to extrapolate beyond the data region, since you have no experience upon which to draw.

The confidence interval lines indicate the bounds of variation that can be expected for the fitted Regression function. The width of the Confidence Interval provides an indication of the quality of the fitted Regression function. The fact that the confidence lines diverge at the ends, and converge in the middle, may be explained one of two ways:

1. The regression function, in this case a line, requires estimation of two parameters: slope and y-intercept. The error in estimating slope can be visualized by imagining the slope of the fitted line varying about its middle. This results in the hourglass-shaped region shown by the confidence intervals.

2. The center of the data is located near the middle of the fitted line. The ability to predict the regression function should be better where there is more data; hence the confidence limits are narrower at the middle.