In the Neyman and Pearson approach, we construct the null and alternative hypotheses and choose a test statistic.
NEYMAN–PEARSON TEST FORMULATION
In the previous section, we introduced the notion
of hypothesis testing and defined the terms null hypothesis and alterative
hypothesis, and type I error and type II er-ror. These terms are attributed to
Jerzy Neyman and Egon Pearson, who were the developers of formal statistical
hypothesis testing in the 1930s. Earlier, R. A. Fisher developed what he called
significance testing, but his description was vague and followed a theory of
inference called fiducial inference that now appears to have been discredited.
The Neyman and Pearson approach has endured but is also chal-lenged by the
Bayesian approach to inference (covered in Section 9.16).
In the Neyman and Pearson approach, we construct
the null and alternative hypotheses and choose a test statistic. We need to
keep in mind the test statistic, the sample size, and the resulting sampling
distribution for the test statistic under the null hypothesis (i.e., the
distribution when the null hypothesis is assumed to be true). Based on these
three factors, we determine a critical value or critical values such that the
type I error never exceeds a specified value for when the null hy-pothesis is
true.
Sometimes, the null hypothesis specifies a unique
sampling distribution for a test statistic. A unique sampling distribution for
the null hypothesis occurs when the fol-lowing criteria are met: (1) we
hypothesize a single value for the population mean; (2) the variance is assumed
to be known; and (3) the normal distribution is assumed for the population
distribution. Under these circumstances, the sampling distribu-tion of the test
statistic is unique. The critical values can be determined based on this unique
sampling distribution; i.e., for a two-tailed (two-sided) test, the 5th
per-centile and the 95th percentile of the sampling distribution would be used
for the critical values of the test statistic; the 10th percentile or the 90th
percentile would be used for a one-tailed (one-sided) test depending on which
side of the test is the alternative. In Section 9.4, one-sided tests will be
discussed and contrasted with two-sided tests.
However, in two important situations the sampling
distribution of the test statis-tic is not unique. The first situation occurs
when the population variance ( σ2) is un-known; in this instance, σ 2 is called a nuisance parameter because it affects the sampling
distribution but otherwise is not used in the hypothesis test. Nevertheless,
even when the population variance is unknown, σ 2 may influence the sampling dis-tribution of the test statistic. For
example, σ2 is relevant to the Behrens–Fisher problem, in
which the distribution of the mean difference depends on the ratio of two
population variances. (See the article by Robinson on the Behrens–Fisher
prob-lem in Johnson and Kotz, 1982). An exception that would not require 2
is the use of the t statistic in a
one-sample hypothesis test, because the t
distribution does not depend on σ 2.
A second situation in which the sampling
distribution of the test statistic is not unique occurs during the use of a
composite null hypothesis. A composite null hy-pothesis is one that includes
more than one value of the parameter of interest for the null hypothesis. For
example, in the case of a population mean, instead of consider-ing only the
value 0 for the null hypothesis, we might consider a range of small val-ues;
all values of μ such that |μ| < 0.5 would be uninteresting
and, hence, included in the null hypothesis.
To review, we have indicated two scenarios: (1)
when the sampling distribution depends on a nuisance parameter, and (2) when
the hypothesized parameter can take on more than one value under the null
hypothesis. For either situation, we con-sider the distribution that is
“closest” to the alternative in a set of distributions for parameter values in
the interval for the null hypothesis. The critical values deter mined for that
“closest” distribution would have a significance level higher than those for
any other parameter values under the null hypothesis. That significance level
is defined to be the level of the overall test of significance. However, this
issue is beyond the scope of this text and, hence, will not be elaborated
further.
In summary, the Neyman–Pearson approach controls
the type I error. Regardless of the sample size, the type I error is controlled
so that it is less than or equal to for any value of the parameters under the
null hypothesis. Consequently, if we use the Neyman–Pearson approach, as we
will in Sections 9.3, 9.4, 9.9, and 9.10, we can be assured that the type I
error is constrained so as to be as small or smaller than the specified α. If the test statistic falls in the rejection region, we can reject the
null hypothesis safely, knowing that the probability that we have made the
wrong deci-sion is no greater than α.
However, the type II error is not controlled by the Neyman–Pearson approach. Three factors determine the probability of a type II error (β): (1) the sample size, (2) the choice of the test statistic, and (3) the value of the parameter under the alterna-tive hypothesis. When the values for the alternative hypothesis are close to those for the null hypothesis, the type II error can be close to 1 – α, which defines the region of nonrejection for the null hypothesis. Thus, the probability of a type II error in-creases as the difference between the mean for the null hypothesis and the mean at the alternative decreases. When this difference between these means becomes large, β becomes small, i.e., closer to α, which defines the significance level of the test as well as its rejection region.
For example, suppose we have a standard normal
distribution with mean μ = 0 and variance of the sampling
distribution of the sample mean σx2 = 1 under the null hypothesis for a sample size n
= 5. By algebra, we can determine that the population has a variance of σ2 = 5 (i.e., σx2 = (σ2/√5) = 1). We choose a two-sided test with significance level 0.05 for
which the critical values are –1.96 and 1.96. Under the alternative hypothesis,
if the mean μ = 0.1 and variance σ2 = 1, then the power of the test (defined to be 1 – the type II error)
is the probability that the sample mean is greater than 1.96 or less than –1.96.
But this probability is the same as the probabil-ity that the Z value for the standard normal
distribution is greater than 1.86 or less than –2.06. Note that we find the
values 1.86 and –2.06 by subtracting 0.1 (μ under
the alternative hypothesis) from +1.96 and –1.96.
From the table of the standard normal distribution
(Appendix E), we see that P[Z < –2.06] = 0.5 – 0.4803 = 0.0197
and P[Z > 1.86] = 0.5 – 0.4686 = 0.0314. The power of the test at this alternative is 0.0197 + 0.0314 = 0.0511.
This mean is close to zero and the power is not much higher than the
significance level 0.05. On the other hand, if μ = 2.96
under the alternative with a variance σ2 = 1, then the power of the test at this alternative is P[Z
< –4.92} + P[Z > –1]. Since P{Z < –4.92] is almost zero, the power
is nearly equal to P[Z > –1] = 0.5 + P[0 > Z > –1] = 0.5
+ P[0 < Z < 1] = 0.5 0.3413 = 0.8413. So as the alternative moves
relatively far from zero, the power becomes large. The relationship between the
alternative hypothesis and the power of a test will be illustrated in Figures
9.1 and 9.2 later in the chapter.
Consequently, when we test hypotheses using the
Neyman–Pearson approach, we do not say that we accept the null hypothesis when
the test statistic falls in the nonrejection region; there may be reasonable
values for the alternative hypothesis when the type II error is high.
In fact, since we select to be small so that we
have a small type I error, 1 – is large. Some values under the alternative
hypothesis have a high type II error, indi-cating that the test has low power
at those alternatives.
In Section 9.12, we will see that the way to
control the type II error is to be inter-ested only in alternatives at least a
specified distance (such as d) from
the null val-ue(s). In addition, we will require that the sample size is large
enough so that the power at those alternatives is reasonably high. By
alternatives we mean the alterna-tive distribution closest to the null
distribution, which is called the least favorable distribution. By reasonably
high we mean at least a specified value, such as . The symbol β(β error) refers to the probability of committing a type II error.
Related Topics
TH 2019 - 2024 pharmacy180.com; Developed by Therithal info.