# Normal Approximation to the Binomial

| Home | | Advanced Mathematics |

## Chapter: Biostatistics for the Health Sciences: Inferences Regarding Proportions

Let W = X/n, where X is a binomial variable with parameters n and p.

NORMAL APPROXIMATION TO THE BINOMIAL

Let W = X/n, where X is a binomial variable with parameters n and p. Then, since W is just a constant times X, E(W) = p and Var(W) = p(1 – p)/n. W represents the pro-portion of successes when X is the number of successes. Because often we wish to estimate the proportion p, we are interested in the mean and variance of W (the sam-ple estimate for the proportion p). In the example where n = 3 and p = 0.5, E(W) = 0.5 and Var(W) = 0.5(0.5)/3 = 0.25/3 = 0.0833.

The central limit theorem applied to the sample mean of n Bernoulli trials tells us that for large n the random variable W, which is the sample mean of the n Bernoulli trials, has a distribution that is approximately normal, with mean p and variance p(1 - p)/n. As p is unknown, the common way to normalize to obtain a statistic that has an approximate standard normal distribution for a hypothesis test would be Z = (W - p0)/ [p0(1 – p0)/n], where p0 is the hypothesized value of p under the null hypoth-esis. Sometimes W itself is used in place of p0 in the denominator, since W(1 – W) is a consistent estimate of the Bernoulli variance p(1 – p) for a single trial. Multiplying both the numerator and denominator by n we see that algebraically Z is also equal to (Xnp0)/√{n[p0(1 – p0)}].

Because the binomial distribution is discrete and the normal distribution is con-tinuous, the approximation can be improved by using what is called the continuity correction. We simply make Z = (Xnp0 – 1/2)/√{n[p0(1 – p0)]}. The normal ap-proximation to the binomial works fairly well with the continuity correction when n 30, provided that 0.3 < p < 0.7. However, in clinical trials we are often interested in p > 0.90; these cases require n to be several hundred before the Z approximation works well. For this reason and because of the computational speed of modern com puters, exact binomial methods commonly are used now, even for fairly large sam-ple sizes such as n = 1000

To express Z in terms of W in the continuity corrected version, we divide both the numerator and denominator by n. The result is Z = (Wp0 – 1/{2n})/ √[p0(1 – p0)/n].

We use this form for Z as it provides a better approximation to expressions such as P(W a) or P(W > a). On the other hand, if we consider P(W < a) or P(W a), then we should use Z = (Xnp0 + 1/2)/√{n[p0(1 – p0)]} or, equivalently, Z = (Wp0 + 1/{2n})/ {p0(1 – p0)/n}.