K-S statistic

The Kolmogorov-Smirnov (K-S) statistic should be used as a relative indicator of curve fit. While some users may be more familiar with Chi Square goodness of fits, or general tests for normality, the K-S test has been shown to provide superior estimates of error in curve fitting models (Massey, 1951)

The K-S statistic reported is alpha, where alpha is the reject level for the hypothesis that the fitted curve is the same as the empirical curve. K-S should be a high value (Max =1.0) when the fit is good and a low value (Min = 0.0) when the fit is not good. When the K-S value goes below 0.05, you will be informed that the Lack of fit is significant.

As an example, if the K-S statistic is 0.4 for a Normal fit and 0.7 for a Johnson fit, the Normal is rejected at 0.6 and the Johnson at 0.3. That makes the Johnson better in that it is rejected at a lower level and is therefore more likely to be the same as the data. The normal fit has a maximum deviation that is expected to occur by chance only 40% of the time and the Johnson fit a deviation that is expected 70% of the time. If the deviation were such that it would be expected to occur 99% of the time, it would be an excellent fit.

The K-S criterion is based on the expectation that there is likely to be a difference between a discrete distribution generated from data and the continuous distribution from which it was drawn, caused by step difference and random error. As n increases, the size of the difference is expected to decrease. If the measured maximum difference is smaller than that expected, then the probability that the distributions are the same is high.

Note that the K-S criterion is very demanding as n becomes large, because the K-S criterion is scaled by the square root of n, reflecting an expected decrease in the step size error. The random error and outliers then dominate, with outliers having a strong effect on the reported value for alpha (because K-S is a measure of maximum deviation).

Note: An asymptotic value for the K-S critical value is taken from Dudewicz and Mishra where "the exact values differ little from the asymptotic values unless n (the number of samples) is very small." The calculation includes a summation of an oscillating, monotonically decreasing function, which is carried to a precision of approximately 1E-8, so that any error in the approximation is primarily in the assumption of sufficiently large n.

Learn more about the Statistical Inference tools for understanding statistics in Six Sigma Demystified (2011, McGraw-Hill) by Paul Keller, in his online Intro. to Statistics short course (only $89) or his online Black Belt certification training course ($875).