# One-Tailed Versus Two-Tailed Tests

| Home | | Advanced Mathematics |

## Chapter: Biostatistics for the Health Sciences: Tests of Hypotheses

For example, in the Tendril DX trial, we have strong prior evidence from other studies that the steroid (treatment group) leads tend to provide lower capture thresh-olds than the nonsteroid (control group) leads.

ONE-TAILED VERSUS TWO-TAILED TESTS

In the previous section, we pointed out that when determining the significance level of a test we must specify either a one-tailed or a two-tailed test. The decision should be based on the context of the problem, i.e., the outcome that we wish to demon-strate. We must consider the relevant research hypothesis, which becomes the alter-native hypothesis.

For example, in the Tendril DX trial, we have strong prior evidence from other studies that the steroid (treatment group) leads tend to provide lower capture thresh-olds than the nonsteroid (control group) leads. Also, we are interested in marketing our product only if we can claim, as do our competitors, that our lead reduces cap-ture thresholds by at least 0.5 volts as compared to nonsteroid leads.

Because we would like to assert that we are able to reduce capture thresholds, it is natural to look at a one-sided alternative. In this case, the null hypothesis H0 is μ1 - μ0 < 0 versus the alternative H1 that μ1μ0 < 0, where μ1 = the population mean for the treatment group and μ0 = the population mean for the control group. In Section 9.8, we will see that the appropriate t statistic (under the normality assumption) would have a critical value determined by t < –ta where ta is the 100(1 – α) per-centile of Student’s t distribution with nc + nt – 2 degrees of freedom, nc is the num-ber of observations in the control group, and nt is the number of observations in the treatment group.

In the real application, Chernick and associates took nt = 3nc and chose the values for nc and nt such that the power of the test was at least 80% when μ1μ0 < –0.5; α was set at 0.05. We will calculate the sample size for this example in Section 9.8 after we introduce the power function.

In other applications, we may be trying to show only equivalence in medical ef-fectiveness of a new treatment compared to an old one. For medical devices or pharmaceuticals, this test of equivalence may occur when the current product (the control) is an effective treatment and we want to show that the new product is equally effective. However, the new product may be preferred for other reasons, such as ease of application. One example might be the introduction of a simpler needle (called a pen in the industry) to inject the insulin that controls sugar levels for diabetic patients, as compared to a standard insulin injection.

In such cases, the null hypothesis is μ1 μ0 = 0, versus the alternative μ1μ0 0; Here, we wish to control the type II error. To do this for β error, we must specify a δ so that we have a good chance of rejecting equivalence if | μ1μ0| > δ. Often, δ is chosen to be some clinically relevant difference in the means. The sample size would be chosen so that when | μ1μ0| > δ, the probability that the test statistic is large enough to reject H0 is high (80% or 90% or 95%), corresponding to a low type II error (20% or 10% or 5%, respectively). For this problem, H0 is rejected when |t| > t α/2 for t α/2 equal to the 100(1 – α/2) percentile of the t distribution with nc + nt – 2 degrees of freedom; the value nc is the number of observations in the control group; nt is the number of observations in the treatment group.

However, such a test is really backwards because the scientific hypothesis that we want to confirm is the null hypothesis rather than the alternative. It is for this reason that Blackwelder and others (Blackwelder, 1982) have recommended, for equivalence testing (defined in the foregoing example) and also for noninferiority testing (a one-sided form of equivalence), that we really want to “prove the null hy-pothesis” in the Neyman–Pearson framework.

Hence, Blackwelder advocates simply switching the null and alternative hy-potheses so that rejecting the null hypothesis becomes rejection of equivalence and accepting the alternative is acceptance of equivalence. Switching the null and alter-native hypotheses allows us to control, through type I error, the probability of false-ly claiming equivalence. When we set the type I (α) and type II (β) errors (i.e., the type II error at | μ1μ0| = δ) to be equal, the distinction between α and β errors be-comes unimportant. The reason the distinction is unimportant is that if the α = β, both formulations yield the same required sample size for a specified power. When | μ1μ0| = δ but α ≠ β , the test results are different from those when α = β. Because it is common to choose α < β, the Blackwelder approach often is preferred, particularly by the Food and Drug Administration. For more details see Black-welder’s often-cited article (Blackwelder, 1982).

Now let us look step by step at a one-tailed (left-tail) test procedure for the pig blood loss data considered in the previous section. A left-tailed test means that we reject H0 if we can show that μ< μ0. Alternatively, a right-tailed test denotes reject-ing H0 if we can show that μ > μ0.

1. State the null hypothesis H0: μ = μ0 versus the alternative hypothesis H1: μ  < μ0.

2. Choose a significance level α = α0 (often we take α0 = 0.05 or 0.01).

3. Determine the critical region, i.e., the region of values of t in the lower (left-tail) tail of the sampling distribution for Student’s t distribution with α0 = 0.05 and n – 1 degrees of freedom when μ = μ0 (i.e., the sampling distribu-tion when the null hypothesis is true).

4. Compute the t statistic: t = ( – μ 0)/(s/√n) for the given sample and sample size n, where is the sample mean and s is the sample standard deviation.

5. Reject the null hypothesis if the test statistic t (computed in step 4) falls in the rejection region for this test; otherwise, do not reject the null hypothesis.

Again we will use the sample data given in Section 8.9 but this time use the stan-dard deviation s = 717.12. The sample mean is 1085.9 and the sample size n = 10. We now have enough information to do the test.

We have the following five steps:

1. The null hypothesis is H0: μ = μ0 = 2200 (H0: μ = 2200) versus the alternative hypothesis H1: μ < μ0 = 2200 (H1: μ < 2200).

2. Choose a significance level α= α0 = 0.05.

3. Determine the critical region, that is, the region of values of t in the lower 0.05 tail of the sampling distribution for t (Student’s t distribution with 9 de-grees of freedom) when μ = μ0 (i.e., the sampling distribution when the null hypothesis is true). For α0 = 0.05 the critical value is t = –1.8331; therefore, the critical region includes all values of t < –1.8331.

4. Compute the t statistic: t = ( – μ0)/(s/√n) for the given sample and sample size n = 10. We know that n = 10, the sample mean is 1085.9, s = 717.12, and μ0 = 2200. t = (1085.9 – 2200)/(717.12/10) = –1114.1/226.773 = –4.913.

5. Since –4.913 is clearly less than –1.8331, we reject H0 at the 5% level.

In the previous example, if it were appropriate to use a one-tailed (right tail) test the procedure would change as follows:

In step 1, we would take H1: μ > μ0 = 2200.

In step 3, we would consider the upper α tail of the sampling distribution for t (Student’s t distribution with 9 degrees of freedom) when μ = μ0 (i.e., the sampling distribution when the null hypothesis is true).

In step 5, the rejection region would be values of t > 1.8331.