Bootstrap Principle

Chapter: Biostatistics for the Health Sciences: Estimating Population Means

The bootstrap principle is very simple.

BOOTSTRAP PRINCIPLE

In Chapter 2, we introduced the concept of bootstrap sampling and told you that it was a nonparametric technique for statistical inference. We also explained the mechanism for generating bootstrap samples and showed how that mechanism is similar to the one used for simple random sampling. In this section, we will de-scribe and use the bootstrap principle to show a simple and straightforward method to generate confidence intervals for population parameters based on the bootstrap samples. Reviewing Chapter 2, the difference between bootstrap sampling and simple random sampling is

1. Instead of sampling from a population, a bootstrap sample is generated by sampling from a sample.

2. The sampling is done with replacement instead of without replacement.

Bootstrap sampling behaves similarly to random sampling in that each bootstrap sample is a sample of size n drawn at random from the empirical distribution F_n, a probability distribution that gives equal weight to each observed data point (i.e., with each draw, each observation has the same chance as any other observation of being the one selected). Similarly, random sampling can be viewed as drawing a sample of size n but from a population distribution F (in which F is an unknown distribution). We are interested in parameters of the distribution that help character-ize the population. In this chapter, we are considering the population mean as the parameter that we would like to know more about.

The bootstrap principle is very simple. We want to draw an inference about the population mean through the sample mean. If we do not make parametric assump-tions (such as assuming the observations have a normal distribution) about the sam-pling distribution of the estimate, we cannot specify the sampling distribution for inference (except approximately through the central limit theorem when the esti-mate is a sample mean).

In constructing confidence intervals, we have considered probability statements about quantities such as Z or t that have the form ( – μ)/σ or ( – μ)/S, where σ is the standard deviation or S is the estimated standard deviation for the sampling distribution (standard error) of the estimated X. The bootstrap principle attempts to mimic this process of constructing quantities such as Z and t and forming confidence intervals. The sample estimate is replaced by its bootstrap analog *, the mean of a bootstrap sample. The parameter μ is replaced by .

Since the parameter μ is unknown, we cannot actually calculate – μ, but from a bootstrap sample we can calculate * – X. We then approximate the dis-tribution of * – by generating many bootstrap samples and computing many * values. By making the number B of bootstrap replications large, we allow the random generation of bootstrap samples (sometimes called the Monte Carlo method) to approximate as closely as we want the bootstrap distribution of X* – . The histogram of bootstrap samples provides a replacement for the sampling distribution of the Z or t statistic used in confidence interval calculations. The histogram also replaces the normal or t distribution tables that we used in the para-metric approaches.

The idea behind the bootstrap is to approximate the distribution of – μ. If this mimicking process achieves that approximation, then we are able to draw infer-ences about μ. We have no particular reason to believe that the mimicking process actually works.

The bootstrap statistical theory, developed since 1980, shows that under very general conditions, mimicking works as the sample size n becomes large. Other em-pirical evidence from simulation studies has shown that mimicking sometimes works well even with small to moderate sample sizes (10–100). The procedure has been modified and generalized to work for a wide variety of statistical estimation problems.

The bootstrap principle is easy to remember and to apply in general. You mimic the sampling from the population by sampling from the empirical distribution. Wherever the unknown parameters appear in your estimation formulae, you replace them by their estimates from the original sample. Wherever the estimates appear in the formulae, you replace them with their bootstrap estimates. The sample estimates and bootstrap estimates can be thought of as actors. The sample estimates take on the role of the parameters and the bootstrap estimates play the role of the sample es-timates.

<< Prev Page

Next Page >>

Bootstrap Principle

Chapter: Biostatistics for the Health Sciences: Estimating Population Means

Confidence Intervals

Confidence Intervals for a Single Population Mean

Z and t Statistics for Two Independent Samples

Confidence Intervals for the Difference between Means from Two Independent Samples (Variance Known)

Confidence Intervals for the Difference between Means from Two Independent Samples (Variance Unknown)

Bootstrap Principle

Bootstrap Percentile Method Confidence Intervals

Sample Size Determination for Confidence Intervals

Exercises questions answers

Tests of Hypotheses: Terminology

Neyman-Pearson Test Formulation

Test of a Mean (Single Sample, Population Variance Known)

Test of a Mean (Single sample, Population Variance Unknown)