One-Tailed Versus Two-Tailed Tests

Chapter: Biostatistics for the Health Sciences: Tests of Hypotheses

ONE-TAILED VERSUS TWO-TAILED TESTS

In the previous section, we pointed out that when determining the significance level of a test we must specify either a one-tailed or a two-tailed test. The decision should be based on the context of the problem, i.e., the outcome that we wish to demon-strate. We must consider the relevant research hypothesis, which becomes the alter-native hypothesis.

For example, in the Tendril DX trial, we have strong prior evidence from other studies that the steroid (treatment group) leads tend to provide lower capture thresh-olds than the nonsteroid (control group) leads. Also, we are interested in marketing our product only if we can claim, as do our competitors, that our lead reduces cap-ture thresholds by at least 0.5 volts as compared to nonsteroid leads.

Because we would like to assert that we are able to reduce capture thresholds, it is natural to look at a one-sided alternative. In this case, the null hypothesis H₀ is μ₁ - μ₀ < 0 versus the alternative H₁ that μ₁ – μ₀ < 0, where μ₁ = the population mean for the treatment group and μ₀ = the population mean for the control group. In Section 9.8, we will see that the appropriate t statistic (under the normality assumption) would have a critical value determined by t < –t_a where t_a is the 100(1 – α) per-centile of Student’s t distribution with n_c + n_t – 2 degrees of freedom, n_c is the num-ber of observations in the control group, and n_t is the number of observations in the treatment group.

In the real application, Chernick and associates took n_t = 3n_c and chose the values for n_c and n_t such that the power of the test was at least 80% when μ₁ – μ₀ < –0.5; α was set at 0.05. We will calculate the sample size for this example in Section 9.8 after we introduce the power function.

In other applications, we may be trying to show only equivalence in medical ef-fectiveness of a new treatment compared to an old one. For medical devices or pharmaceuticals, this test of equivalence may occur when the current product (the control) is an effective treatment and we want to show that the new product is equally effective. However, the new product may be preferred for other reasons, such as ease of application. One example might be the introduction of a simpler needle (called a pen in the industry) to inject the insulin that controls sugar levels for diabetic patients, as compared to a standard insulin injection.

In such cases, the null hypothesis is μ₁– μ₀ = 0, versus the alternative μ₁ – μ₀≠ 0; Here, we wish to control the type II error. To do this for β error, we must specify a δ so that we have a good chance of rejecting equivalence if | μ₁ – μ₀| > δ. Often, δ is chosen to be some clinically relevant difference in the means. The sample size would be chosen so that when | μ₁ – μ₀| > δ, the probability that the test statistic is large enough to reject H₀ is high (80% or 90% or 95%), corresponding to a low type II error (20% or 10% or 5%, respectively). For this problem, H₀ is rejected when |t| > t _α_/2 for t_α_/2 equal to the 100(1 – α/2) percentile of the t distribution with n_c + n_t – 2 degrees of freedom; the value n_c is the number of observations in the control group; n_t is the number of observations in the treatment group.

However, such a test is really backwards because the scientific hypothesis that we want to confirm is the null hypothesis rather than the alternative. It is for this reason that Blackwelder and others (Blackwelder, 1982) have recommended, for equivalence testing (defined in the foregoing example) and also for noninferiority testing (a one-sided form of equivalence), that we really want to “prove the null hy-pothesis” in the Neyman–Pearson framework.

Hence, Blackwelder advocates simply switching the null and alternative hy-potheses so that rejecting the null hypothesis becomes rejection of equivalence and accepting the alternative is acceptance of equivalence. Switching the null and alter-native hypotheses allows us to control, through type I error, the probability of false-ly claiming equivalence. When we set the type I (α) and type II (β) errors (i.e., the type II error at |μ₁ – μ₀| = δ) to be equal, the distinction between α and β errors be-comes unimportant. The reason the distinction is unimportant is that if the α = β, both formulations yield the same required sample size for a specified power. When | μ₁ – μ₀| = δ but α ≠ β , the test results are different from those when α = β. Because it is common to choose α < β, the Blackwelder approach often is preferred, particularly by the Food and Drug Administration. For more details see Black-welder’s often-cited article (Blackwelder, 1982).

Now let us look step by step at a one-tailed (left-tail) test procedure for the pig blood loss data considered in the previous section. A left-tailed test means that we reject H₀ if we can show that μ< μ₀. Alternatively, a right-tailed test denotes reject-ing H₀ if we can show that μ > μ₀.

1. State the null hypothesis H₀: μ = μ₀ versus the alternative hypothesis H₁: μ < μ₀.

2. Choose a significance level α = α₀ (often we take α₀ = 0.05 or 0.01).

3. Determine the critical region, i.e., the region of values of t in the lower (left-tail) tail of the sampling distribution for Student’s t distribution with α₀ = 0.05 and n – 1 degrees of freedom when μ = μ₀ (i.e., the sampling distribu-tion when the null hypothesis is true).

4. Compute the t statistic: t = ( – μ₀)/(s/√n) for the given sample and sample size n, where is the sample mean and s is the sample standard deviation.

5. Reject the null hypothesis if the test statistic t (computed in step 4) falls in the rejection region for this test; otherwise, do not reject the null hypothesis.

Again we will use the sample data given in Section 8.9 but this time use the stan-dard deviation s = 717.12. The sample mean is 1085.9 and the sample size n = 10. We now have enough information to do the test.

We have the following five steps:

1. The null hypothesis is H₀: μ = μ₀ = 2200 (H₀: μ = 2200) versus the alternative hypothesis H₁: μ < μ₀ = 2200 (H₁: μ < 2200).

2. Choose a significance level α= α₀ = 0.05.

3. Determine the critical region, that is, the region of values of t in the lower 0.05 tail of the sampling distribution for t (Student’s t distribution with 9 de-grees of freedom) when μ = μ₀ (i.e., the sampling distribution when the null hypothesis is true). For α₀ = 0.05 the critical value is t = –1.8331; therefore, the critical region includes all values of t < –1.8331.

4. Compute the t statistic: t = ( – μ₀)/(s/√n) for the given sample and sample size n = 10. We know that n = 10, the sample mean is 1085.9, s = 717.12, and μ₀ = 2200. t = (1085.9 – 2200)/(717.12/√10) = –1114.1/226.773 = –4.913.

5. Since –4.913 is clearly less than –1.8331, we reject H₀ at the 5% level.

In the previous example, if it were appropriate to use a one-tailed (right tail) test the procedure would change as follows:

In step 1, we would take H₁: μ > μ₀= 2200.

In step 3, we would consider the upper α tail of the sampling distribution for t (Student’s t distribution with 9 degrees of freedom) when μ = μ₀ (i.e., the sampling distribution when the null hypothesis is true).

In step 5, the rejection region would be values of t > 1.8331.

<< Prev Page

Next Page >>

One-Tailed Versus Two-Tailed Tests

Chapter: Biostatistics for the Health Sciences: Tests of Hypotheses

Exercises questions answers

Tests of Hypotheses: Terminology

Neyman-Pearson Test Formulation

Test of a Mean (Single Sample, Population Variance Known)

Test of a Mean (Single sample, Population Variance Unknown)

One-Tailed Versus Two-Tailed Tests

p-Values

Type I and Type II Errors

The Power Function

Two-Sample t Test (Independent Samples with a Common Variance)

Paired t Test

Relationship between Confidence Intervals and Hypothesis Tests

Bootstrap Percentile Method Test