Neyman-Pearson Test Formulation

Chapter: Biostatistics for the Health Sciences: Tests of Hypotheses

In the Neyman and Pearson approach, we construct the null and alternative hypotheses and choose a test statistic.

NEYMAN–PEARSON TEST FORMULATION

In the previous section, we introduced the notion of hypothesis testing and defined the terms null hypothesis and alterative hypothesis, and type I error and type II er-ror. These terms are attributed to Jerzy Neyman and Egon Pearson, who were the developers of formal statistical hypothesis testing in the 1930s. Earlier, R. A. Fisher developed what he called significance testing, but his description was vague and followed a theory of inference called fiducial inference that now appears to have been discredited. The Neyman and Pearson approach has endured but is also chal-lenged by the Bayesian approach to inference (covered in Section 9.16).

In the Neyman and Pearson approach, we construct the null and alternative hypotheses and choose a test statistic. We need to keep in mind the test statistic, the sample size, and the resulting sampling distribution for the test statistic under the null hypothesis (i.e., the distribution when the null hypothesis is assumed to be true). Based on these three factors, we determine a critical value or critical values such that the type I error never exceeds a specified value for when the null hy-pothesis is true.

Sometimes, the null hypothesis specifies a unique sampling distribution for a test statistic. A unique sampling distribution for the null hypothesis occurs when the fol-lowing criteria are met: (1) we hypothesize a single value for the population mean; (2) the variance is assumed to be known; and (3) the normal distribution is assumed for the population distribution. Under these circumstances, the sampling distribu-tion of the test statistic is unique. The critical values can be determined based on this unique sampling distribution; i.e., for a two-tailed (two-sided) test, the 5th per-centile and the 95th percentile of the sampling distribution would be used for the critical values of the test statistic; the 10th percentile or the 90th percentile would be used for a one-tailed (one-sided) test depending on which side of the test is the alternative. In Section 9.4, one-sided tests will be discussed and contrasted with two-sided tests.

However, in two important situations the sampling distribution of the test statis-tic is not unique. The first situation occurs when the population variance ( σ²) is un-known; in this instance, σ² is called a nuisance parameter because it affects the sampling distribution but otherwise is not used in the hypothesis test. Nevertheless, even when the population variance is unknown, σ² may influence the sampling dis-tribution of the test statistic. For example, σ² is relevant to the Behrens–Fisher problem, in which the distribution of the mean difference depends on the ratio of two population variances. (See the article by Robinson on the Behrens–Fisher prob-lem in Johnson and Kotz, 1982). An exception that would not require ² is the use of the t statistic in a one-sample hypothesis test, because the t distribution does not depend on σ².

A second situation in which the sampling distribution of the test statistic is not unique occurs during the use of a composite null hypothesis. A composite null hy-pothesis is one that includes more than one value of the parameter of interest for the null hypothesis. For example, in the case of a population mean, instead of consider-ing only the value 0 for the null hypothesis, we might consider a range of small val-ues; all values of μ such that |μ| < 0.5 would be uninteresting and, hence, included in the null hypothesis.

To review, we have indicated two scenarios: (1) when the sampling distribution depends on a nuisance parameter, and (2) when the hypothesized parameter can take on more than one value under the null hypothesis. For either situation, we con-sider the distribution that is “closest” to the alternative in a set of distributions for parameter values in the interval for the null hypothesis. The critical values deter mined for that “closest” distribution would have a significance level higher than those for any other parameter values under the null hypothesis. That significance level is defined to be the level of the overall test of significance. However, this issue is beyond the scope of this text and, hence, will not be elaborated further.

In summary, the Neyman–Pearson approach controls the type I error. Regardless of the sample size, the type I error is controlled so that it is less than or equal to for any value of the parameters under the null hypothesis. Consequently, if we use the Neyman–Pearson approach, as we will in Sections 9.3, 9.4, 9.9, and 9.10, we can be assured that the type I error is constrained so as to be as small or smaller than the specified α. If the test statistic falls in the rejection region, we can reject the null hypothesis safely, knowing that the probability that we have made the wrong deci-sion is no greater than α.

However, the type II error is not controlled by the Neyman–Pearson approach. Three factors determine the probability of a type II error (β): (1) the sample size, (2) the choice of the test statistic, and (3) the value of the parameter under the alterna-tive hypothesis. When the values for the alternative hypothesis are close to those for the null hypothesis, the type II error can be close to 1 – α, which defines the region of nonrejection for the null hypothesis. Thus, the probability of a type II error in-creases as the difference between the mean for the null hypothesis and the mean at the alternative decreases. When this difference between these means becomes large, β becomes small, i.e., closer to α, which defines the significance level of the test as well as its rejection region.

For example, suppose we have a standard normal distribution with mean μ = 0 and variance of the sampling distribution of the sample mean σ_x² = 1 under the null hypothesis for a sample size n = 5. By algebra, we can determine that the population has a variance of σ² = 5 (i.e., σ_x² = (σ²/√5) = 1). We choose a two-sided test with significance level 0.05 for which the critical values are –1.96 and 1.96. Under the alternative hypothesis, if the mean μ = 0.1 and variance σ² = 1, then the power of the test (defined to be 1 – the type II error) is the probability that the sample mean is greater than 1.96 or less than –1.96. But this probability is the same as the probabil-ity that the Z value for the standard normal distribution is greater than 1.86 or less than –2.06. Note that we find the values 1.86 and –2.06 by subtracting 0.1 (μ under the alternative hypothesis) from +1.96 and –1.96.

From the table of the standard normal distribution (Appendix E), we see that P[Z < –2.06] = 0.5 – 0.4803 = 0.0197 and P[Z > 1.86] = 0.5 – 0.4686 = 0.0314. The power of the test at this alternative is 0.0197 + 0.0314 = 0.0511. This mean is close to zero and the power is not much higher than the significance level 0.05. On the other hand, if μ = 2.96 under the alternative with a variance σ² = 1, then the power of the test at this alternative is P[Z < –4.92} + P[Z > –1]. Since P{Z < –4.92] is almost zero, the power is nearly equal to P[Z > –1] = 0.5 + P[0 > Z > –1] = 0.5 + P[0 < Z < 1] = 0.5 0.3413 = 0.8413. So as the alternative moves relatively far from zero, the power becomes large. The relationship between the alternative hypothesis and the power of a test will be illustrated in Figures 9.1 and 9.2 later in the chapter.

Consequently, when we test hypotheses using the Neyman–Pearson approach, we do not say that we accept the null hypothesis when the test statistic falls in the nonrejection region; there may be reasonable values for the alternative hypothesis when the type II error is high.

In fact, since we select to be small so that we have a small type I error, 1 – is large. Some values under the alternative hypothesis have a high type II error, indi-cating that the test has low power at those alternatives.

In Section 9.12, we will see that the way to control the type II error is to be inter-ested only in alternatives at least a specified distance (such as d) from the null val-ue(s). In addition, we will require that the sample size is large enough so that the power at those alternatives is reasonably high. By alternatives we mean the alterna-tive distribution closest to the null distribution, which is called the least favorable distribution. By reasonably high we mean at least a specified value, such as . The symbol β(β error) refers to the probability of committing a type II error.

<< Prev Page

Next Page >>

Neyman-Pearson Test Formulation

Chapter: Biostatistics for the Health Sciences: Tests of Hypotheses

Bootstrap Principle

Bootstrap Percentile Method Confidence Intervals

Sample Size Determination for Confidence Intervals

Exercises questions answers

Tests of Hypotheses: Terminology

Neyman-Pearson Test Formulation

Test of a Mean (Single Sample, Population Variance Known)

Test of a Mean (Single sample, Population Variance Unknown)

One-Tailed Versus Two-Tailed Tests

p-Values

Type I and Type II Errors

The Power Function

Two-Sample t Test (Independent Samples with a Common Variance)