Kruskal-Wallis one-way analysis of variance

From Wikipedia, the free encyclopedia

In statistics, the Kruskal-Wallis one-way analysis of variance by ranks (named after William Kruskal and W. Allen Wallis) is a non-parametric method for testing equality of population medians among groups. Intuitively, it is identical to a one-way analysis of variance with the data replaced by their ranks. It is an extension of the Mann-Whitney U test to 3 or more groups.

Since it is a non-parametric method, the Kruskal-Wallis test does not assume a normal population, unlike the analogous one-way analysis of variance. However, the test does assume an identically-shaped distribution for each group, except for any difference in medians.

[edit] Method

  1. Rank all data from all groups together; i.e., rank the data from 1 to N ignoring group membership. Assign any tied values the average of the ranks they would have received had they not been tied.
  2. The test statistic is given by: K = (N-1)\frac{\sum_{i=1}^g n_i(\bar{r}_{i\cdot} - \bar{r})^2}{\sum_{i=1}^g\sum_{j=1}^{n_i}(r_{ij} - \bar{r})^2}, where:
    • ni is the number of observations in group i
    • rij is the rank (among all observations) of observation j from group i
    • N is the total number of observations across all groups
    • \bar{r}_{i\cdot} = \frac{\sum_{j=1}^{n_i}{r_{ij}}}{n_i},
    • \bar{r} =(N+1)/2 is the average of all the rij.
      Notice that the denominator of the expression for K is exactly (N − 1)N(N + 1) / 12. Thus K = \frac{12}{N(N+1)}\sum_{i=1}^g n_i(\bar{r}_{i\cdot} - \bar{r})^2.
  3. A correction for ties can be made by dividing K by 1 - \frac{\sum_{i=1}^G (t_{i}^3 - t_{i})}{N^3-N}, where G is the number of groupings of different tied ranks, and ti is the number of tied values within group i that are tied at a particular value. This correction usually makes little difference in the value of K unless there are a large number of ties.
  4. Finally, the p-value is approximated by \Pr(\chi^2_{g-1} \ge K). If some ni's are small (i.e., less than 5) the probability distribution of K can be quite different from this chi-square distribution. If a table of the chi-square probability distribution is available, the critical value of chi-square, \chi^2_{\alpha: g-1}, can be found by entering the table at g − 1 degrees of freedom and looking under the desired significance or alpha level. The null hypothesis of equal population medians would then be rejected if K \ge \chi^2_{\alpha: g-1}. Appropriate multiple comparisons would then be performed on the group medians.

[edit] See also

[edit] References

  • William H. Kruskal and W. Allen Wallis. Use of ranks in one-criterion variance analysis. Journal of the American Statistical Association 47 (260): 583–621, December 1952.[1]
  • Sidney Siegel and N. John Castellan, Jr. (1988). Nonparametric Statistics for the Behavioral Sciences (second edition). New York: McGraw-Hill.