The result of rejecting the null hypothesis in the analysis of variance is to conclude that there is a difference among the means.
MULTIPLE COMPARISONS
The result of rejecting the null hypothesis in the
analysis of variance is to conclude that there is a difference among the means.
However, if we have three or more populations, then how exactly do these means
differ? Sometimes researchers consider the precise nature of the differences
among these means to be an important scientific issue. Alternatives to the
analysis of variance, called ranking and selection proce-dures, address this
issue directly. As the alternative methods are beyond the scope of the present
text, we refer the interested reader to Gibbons, Olkin, and Sobel (1977) for an
explanation of the ranking and selection methodology.
In the framework of the analysis of variance, the
traditional approach is to do the F test
first. If the null hypothesis is rejected, we can then look at several
hypotheses that compare the pair-wise
differences of the means or other linear combinations of the means that might
be of interest. For example, we may be interested in μ1 – μ2 and μ3 – μ4. A less obvious contrast might be μ1 – 2 μ2 + μ3. Any such linear com-bination of means can be
considered, although in most practical situations mean dif-ferences are
considered and are tested against the null hypothesis prove that they are zero.
Since many hypotheses are being tested simultaneously, the methodology must
take this fact into account. Such methodology is sometimes called simultane-ous
inference (for example, see Miller, 1981) or multiple comparisons [see Hochberg
and Tamhane (1987) or Hsu (1996)]. Resampling approaches, including
bootstrapping, have also been successfully employed to accomplish this task
[see Westfall and Young (1993)].
In order to find out which means are significantly
different from one another, we are at first tempted to look at the various t tests that compare the differences of
the individual means. For k groups
there are k(k – 1)/2 such comparisons. Even for k = 4, there are six comparisons.
The original t
tests might have been constructed to test the hypotheses at the 5% significance
level. The threshold C for such a
test is determined by the t
distribution so that if T is the test
statistic, then P(|T| > C) = 0.05 The constant C
is found from the table of the t
distribution and depends on the degrees of freedom. But this condi-tion is set
for just one such test.
If we do six such tests and set the thresholds to
satisfy P(|T| > C) = 0.05 for
each test statistic, the probability that at least one of the test statistics
will exceed the threshold is much higher than 0.05. The methods of Scheffe,
Tukey, and Dunnett, among others, are designed to guard against this. See
Miller (1981) for coverage of all these methods. For these methods, we choose a
threshold or thresholds so that the probability that any one of the thresholds
is exceeded is no greater than 0.05. See Hsu (1996), Chapter 5, pp. 119–174, to
see all such procedures.
In our example, when the test statistic exceeds the
threshold, the result amounts to declaring a significant difference between a
particular pair of group means. The family-wise error rate is (by definition)
the probability that any such declaration would be incorrect. In doing multiple
comparisons, we usually want to control this family-wise error rate at a level
of 0.05 (or 0.10).
When we use Tukey’s honest significant difference
test, our test statistic has ex-actly the same form as that of a t test. Our confidence interval for the
mean differ-ence has the same form as a confidence interval using the t distribution. The only difference in
the confidence interval between the HSD test and the t test is that the choice of the constant C is larger than what we would choose for a single t test.
In the application, we assume that the k groups each have equivalent sample
sizes, n. This is called a balanced
design. To calculate the confidence interval we need a table of constants
derived by Tukey (reprinted in Appendix B). We simply compare the difference
between the two sample means to the Tukey HSD for one-way ANOVA, which is
determined by Equation 13.2:
HSD = q(α, k, N – k) √(MSw/n) (13.2)
where k =
the number of groups, n = the number
of samples per group, N is the total
number of samples, MSw is the within
group mean square, and α is the significance level or family-wise error rate. The constant q(α, k, N – k) is found in Tukey’s tables.
Note the use of the term q in the equation. The quantity q
is sometimes called the studentized range. A table for the studentized range
for values of α = 0.01, 0.05, and 0.10 is given in Appendix B.
Related Topics
TH 2019 - 2025 pharmacy180.com; Developed by Therithal info.