Tests of Hypotheses: Terminology

Chapter: Biostatistics for the Health Sciences: Tests of Hypotheses

Hypothesis testing is a formal scientific process that accounts for statistical uncertainty.

Tests of Hypotheses

TERMINOLOGY

Hypothesis testing is a formal scientific process that accounts for statistical uncertainty. As such, the process involves much new statistical terminology that we now introduce. A hypothesis is a statement of belief about the values of population para-meters. In hypothesis testing, we usually consider two hypotheses: the null and al-ternative hypotheses. The null hypothesis, denoted by H₀, is usually a hypothesis of no difference. Initially, we will consider a type of H₀ that is a claim that there is no difference between the population parameter and its hypothesized value or set of values. The hypothesized values chosen for the null hypothesis are usually chosen to be uninteresting values. An example might be that in a trial comparing two dia-betes drugs, the mean values for fasting plasma glucose are the same for the two treatment groups.

In general, the experimenter is interested in rejecting the null hypothesis. The al-ternative hypothesis, denoted by H₁, is a claim that the null hypothesis is false; i.e., the population parameter takes on a value different from the value or values specified by the null hypothesis. The alternative hypothesis is usually the scientifically inter-esting hypothesis that we would like to confirm. By using probability theory, our goal is to lend credence to the alternative hypothesis by rejecting the null hypothesis. In the diabetes example, an interesting alternative might be that the fasting plasma glu-cose mean is significantly (both statistically and clinically) lower for patients with the experimental drug as compared to the mean for patients with the control drug.

Because of statistical uncertainty regarding inferences about population parame-ters based on sample data, we cannot prove or disprove either the null or the alter-native hypotheses. Rather, we make a decision based on probability and accept a probability of making an incorrect decision.

The type I error is defined as the probability of falsely rejecting the null hypoth-esis; i.e., to claim on the basis of data from a sample that the true parameter is not a value specified by the null hypothesis when in fact it is. In other words, a type I er-ror occurs when the null hypothesis is true but we incorrectly reject H₀. The other possible mistake we can make is to not reject the null hypothesis when the true pa-rameter value is specified by the alternative hypothesis. This kind of error is called a type II error.

Based on the observed data, we form a statistic (called a test statistic) and con-sider its sampling distribution in order to define critical values for rejecting the null hypothesis. For example, the Z and t statistics covered previously (refer to Chapter 8) can serve as test statistics for those population parameters. A statistician uses one or more cutoff values for the test statistic to determine when to reject or not to reject the null hypothesis.

These cutoff values are called critical values; the set of values for which the null hypothesis would be rejected is called the critical region, or rejection region. The other values of the test statistic form a region that we will call the nonrejection re-gion. We are tempted to call the nonrejection region the acceptance region; howev-er, we hesitate to do so because the Neyman–Pearson approach to hypothesis test-ing chooses the critical value to control the type I error, but the type II error then depends on the specific value of the parameter when the alternative is true. In the next section, we will discuss this point in detail as well as the Neyman–Pearson ap-proach.

The probability of observing a value in the critical region when the null hypothe-sis is correct is called the significance level; the hypothesis test is also called a test of significance. The significance level is denoted by α, which often is set at a low value such as 0.01 or 0.05. These values also can be termed error levels; i.e., we are acknowledging that it is acceptable to be wrong one time out of 100 tests or five times out of 100 tests, respectively. The symbol is also the probability of a type I error; the symbol β is used to denote the probability of a type II error, as explained in Section 9.7.

Given a test statistic and an observed value, one can compute the probability of observing a value as extreme or more extreme than the observed value when the null hypothesis is true. This probability is called the p-value. The p-value is related to the significance level in that if we had chosen the critical value to be equal to the observed value of the test statistic, the p-value would be equal to the significance level.

<< Prev Page

Next Page >>

Tests of Hypotheses: Terminology

Chapter: Biostatistics for the Health Sciences: Tests of Hypotheses

Confidence Intervals for the Difference between Means from Two Independent Samples (Variance Unknown)

Bootstrap Principle

Bootstrap Percentile Method Confidence Intervals

Sample Size Determination for Confidence Intervals

Exercises questions answers

Tests of Hypotheses: Terminology

Neyman-Pearson Test Formulation

Test of a Mean (Single Sample, Population Variance Known)

Test of a Mean (Single sample, Population Variance Unknown)

One-Tailed Versus Two-Tailed Tests

p-Values

Type I and Type II Errors

The Power Function