Multiple Comparisons

Chapter: Biostatistics for the Health Sciences: One-Way Analysis of Variance

The result of rejecting the null hypothesis in the analysis of variance is to conclude that there is a difference among the means.

<< Prev Page

Next Page >>

MULTIPLE COMPARISONS

General Discussion

The result of rejecting the null hypothesis in the analysis of variance is to conclude that there is a difference among the means. However, if we have three or more populations, then how exactly do these means differ? Sometimes researchers consider the precise nature of the differences among these means to be an important scientific issue. Alternatives to the analysis of variance, called ranking and selection proce-dures, address this issue directly. As the alternative methods are beyond the scope of the present text, we refer the interested reader to Gibbons, Olkin, and Sobel (1977) for an explanation of the ranking and selection methodology.

In the framework of the analysis of variance, the traditional approach is to do the F test first. If the null hypothesis is rejected, we can then look at several hypotheses that compare the pair-wise differences of the means or other linear combinations of the means that might be of interest. For example, we may be interested in μ₁ – μ₂ and μ₃ – μ₄. A less obvious contrast might be μ₁ – 2 μ₂ + μ₃. Any such linear com-bination of means can be considered, although in most practical situations mean dif-ferences are considered and are tested against the null hypothesis prove that they are zero. Since many hypotheses are being tested simultaneously, the methodology must take this fact into account. Such methodology is sometimes called simultane-ous inference (for example, see Miller, 1981) or multiple comparisons [see Hochberg and Tamhane (1987) or Hsu (1996)]. Resampling approaches, including bootstrapping, have also been successfully employed to accomplish this task [see Westfall and Young (1993)].

Tukey’s Honest Significant Difference (HSD) Test

In order to find out which means are significantly different from one another, we are at first tempted to look at the various t tests that compare the differences of the individual means. For k groups there are k(k – 1)/2 such comparisons. Even for k = 4, there are six comparisons.

The original t tests might have been constructed to test the hypotheses at the 5% significance level. The threshold C for such a test is determined by the t distribution so that if T is the test statistic, then P(|T| > C) = 0.05 The constant C is found from the table of the t distribution and depends on the degrees of freedom. But this condi-tion is set for just one such test.

If we do six such tests and set the thresholds to satisfy P(|T| > C) = 0.05 for each test statistic, the probability that at least one of the test statistics will exceed the threshold is much higher than 0.05. The methods of Scheffe, Tukey, and Dunnett, among others, are designed to guard against this. See Miller (1981) for coverage of all these methods. For these methods, we choose a threshold or thresholds so that the probability that any one of the thresholds is exceeded is no greater than 0.05. See Hsu (1996), Chapter 5, pp. 119–174, to see all such procedures.

In our example, when the test statistic exceeds the threshold, the result amounts to declaring a significant difference between a particular pair of group means. The family-wise error rate is (by definition) the probability that any such declaration would be incorrect. In doing multiple comparisons, we usually want to control this family-wise error rate at a level of 0.05 (or 0.10).

When we use Tukey’s honest significant difference test, our test statistic has ex-actly the same form as that of a t test. Our confidence interval for the mean differ-ence has the same form as a confidence interval using the t distribution. The only difference in the confidence interval between the HSD test and the t test is that the choice of the constant C is larger than what we would choose for a single t test.

In the application, we assume that the k groups each have equivalent sample sizes, n. This is called a balanced design. To calculate the confidence interval we need a table of constants derived by Tukey (reprinted in Appendix B). We simply compare the difference between the two sample means to the Tukey HSD for one-way ANOVA, which is determined by Equation 13.2:

HSD = q(α, k, N – k) √(MSw/n) (13.2)

where k = the number of groups, n = the number of samples per group, N is the total number of samples, MSw is the within group mean square, and α is the significance level or family-wise error rate. The constant q(α, k, N – k) is found in Tukey’s tables.

Note the use of the term q in the equation. The quantity q is sometimes called the studentized range. A table for the studentized range for values of α = 0.01, 0.05, and 0.10 is given in Appendix B.

<< Prev Page

Next Page >>