Normal Approximation to the Binomial

Chapter: Biostatistics for the Health Sciences: Inferences Regarding Proportions

Let W = X/n, where X is a binomial variable with parameters n and p.

NORMAL APPROXIMATION TO THE BINOMIAL

Let W = X/n, where X is a binomial variable with parameters n and p. Then, since W is just a constant times X, E(W) = p and Var(W) = p(1 – p)/n. W represents the pro-portion of successes when X is the number of successes. Because often we wish to estimate the proportion p, we are interested in the mean and variance of W (the sam-ple estimate for the proportion p). In the example where n = 3 and p = 0.5, E(W) = 0.5 and Var(W) = 0.5(0.5)/3 = 0.25/3 = 0.0833.

The central limit theorem applied to the sample mean of n Bernoulli trials tells us that for large n the random variable W, which is the sample mean of the n Bernoulli trials, has a distribution that is approximately normal, with mean p and variance p(1 - p)/n. As p is unknown, the common way to normalize to obtain a statistic that has an approximate standard normal distribution for a hypothesis test would be Z = (W - p₀)/ √[p₀(1 – p₀)/n], where p₀ is the hypothesized value of p under the null hypoth-esis. Sometimes W itself is used in place of p₀ in the denominator, since W(1 – W) is a consistent estimate of the Bernoulli variance p(1 – p) for a single trial. Multiplying both the numerator and denominator by n we see that algebraically Z is also equal to (X – np₀)/√{n[p₀(1 – p₀)}].

Because the binomial distribution is discrete and the normal distribution is con-tinuous, the approximation can be improved by using what is called the continuity correction. We simply make Z = (X – np₀ – 1/2)/√{n[p₀(1 – p₀)]}. The normal ap-proximation to the binomial works fairly well with the continuity correction when n ≥ 30, provided that 0.3 < p < 0.7. However, in clinical trials we are often interested in p > 0.90; these cases require n to be several hundred before the Z approximation works well. For this reason and because of the computational speed of modern com puters, exact binomial methods commonly are used now, even for fairly large sam-ple sizes such as n = 1000

To express Z in terms of W in the continuity corrected version, we divide both the numerator and denominator by n. The result is Z = (W – p₀ – 1/{2n})/ √[p₀(1 – p₀)/n].

We use this form for Z as it provides a better approximation to expressions such as P(W ≤ a) or P(W > a). On the other hand, if we consider P(W < a) or P(W ≥ a), then we should use Z = (X – np₀ + 1/2)/√{n[p₀(1 – p₀)]} or, equivalently, Z = (W – p₀ + 1/{2n})/ {p₀(1 – p₀)/n}.

<< Prev Page

Next Page >>

Normal Approximation to the Binomial

Chapter: Biostatistics for the Health Sciences: Inferences Regarding Proportions

Group Sequential Methods

Missing Data and Imputation

Exercises questions answers

Why Are Proportions Important?

Mean and Standard Deviation for the Binomial Distribution

Normal Approximation to the Binomial

Hypothesis Test for a Single Binomial Proportion

Testing the Difference between Two Proportions

Confidence Intervals for Proportions

Sample Size Determination-Confidence Intervals and Hypothesis Tests

Exercises questions answers

Categorical Data and Chi-Square Tests

Understanding Chi-Square