# How to Select a Bootstrap Sample

| Home | | Advanced Mathematics |

## Chapter: Biostatistics for the Health Sciences: Defining Populations and Selecting Samples

In general, we can choose a random sample of size n with replacement from a population of size N.

HOW TO SELECT A BOOTSTRAP SAMPLE

The bootstrap method and its use in statistical inference will be covered more extensively in Chapter 8 when we discuss its application in estimation and contrast it to parametric methods. In most applications, a sampling procedure is used to approximate the bootstrap method. That sampling procedure generates what are called bootstrap samples, which are obtained by sampling with replacement. Because sampling with replacement is a general sampling technique that is similar to ran-dom sampling, we introduce it here.

In general, we can choose a random sample of size n with replacement from a population of size N. In our applications of the bootstrap, the population for boot-strap sampling will not be the actual population of interest but rather a given, presumably random, sample from the population.

In the first stage of selecting a bootstrap sample, we take the interval [0, 1] and divide it into N equal parts. Then, for uniform random number U, we assign index 1 if 0 U < 1/N, and index 2 if 1/N U < 2/N, and so on until we assign index N if (N – 1)/N U < 1. We generate n such indices by generating n consecutive uniform random numbers. The procedure is identical to our rejection sampling scheme ex-cept that none of the samples is rejected because repeated indices are allowed.

Bootstrap sampling is a special case of sampling with replacement. In ordinary bootstrap sampling, n = N. Remember, for bootstrap sampling the population size N is actually the size of the original random sample; the true population is replaced by that sample.

Let us consider the population of six patients described previously in Section 2.4. Again, age is the variable of interest. We will generate 10 bootstrap samples of size six for the ages of the patients. For the first sample we will use row 3 from Table 2.1. The second sample will be generated using row 4, and so on for samples 3 through 10.

The first six uniform random numbers in row 3 are 69386, 71708, 88608, 67251, 22512, and 00169. The corresponding indices are 5, 5, 6, 5, 2, and 1. The corre-sponding patients are E, E, F, E, B, and A, and the sampled ages are 32, 32, 9, 32, 17, and 26. The average age for this bootstrap sample is 24.6667.

There are 66 = 46,656 possible bootstrap samples of size six. In practice, we sample only a small number, such as 50 to 100, when the total number of possible samples is so large. A random selection of 100 samples provides a good estimate of the bootstrap mean obtained from averaging the 46,656 bootstrap samples.

It is also true that the bootstrap sample mean is an unbiased estimate of the pop-ulation mean for the following reason: For any random sample, the bootstrap sam-ple estimate is an unbiased estimate of the mean of the random sample, and the mean of the random sample is an unbiased estimate of the population mean.

We will determine all ten bootstrap samples, calculate their sample means, and see how close the average of the ten bootstrap sample means is to the population mean age. Note that although the bootstrap provides an unbiased estimate of the population mean, we can demonstrate this result only by averaging all 46,656 boot-strap samples. Obviously, this calculation is difficult, so we will approximate only the mean of the original sample by averaging the ten bootstrap samples. We expect the result to be close to the mean of the original sample.

The 10 bootstrap samples are as follows:

1. 69386, 71708, 88608, 67251, 22512, and 00169 corresponding to patients E, E, F, E, B, and A and ages 32, 32, 9, 32, 17, and 26 with mean = 24.6667.

2. 68381, 61725, 49122, 75836, 15368, and 52551 corresponding to patients E, D, C, E, A, and D, corresponding to ages 32, 70, 45, 32, 26, and 70 with mean = 45.8333.

3. 69158, 38683, 41374, 17028, 09304, and 10834 corresponding to patients E, C, C, B, A, and A, corresponding to ages 32, 45, 45, 17, 26, and 26 with mean = 31.8333.

4. 00858, 04352, 17833, 41105, 46569, and 90109 corresponding to patients A, A, B, C, C, and F, corresponding to ages 26, 26, 17, 45, 45, and 9 with mean = 28.0.

5. 86972, 51707, 58242, 16035, 94887, and 83510 corresponding to patients F, D, D, A, F, and F, corresponding to ages 9, 70, 70, 26, 9, and 9 with mean = 32.1667.

6. 30606, 45225, 30161, 07973, 03034, and 82983 corresponding to patients B, C, B, A, A, and E, corresponding to ages 17, 45, 17, 26, 26, and 32 with mean = 27.1667.

7. 93864, 49044, 57169, 43125, 11703, and 87009 corresponding to patients F, C, D, C, A, and F, corresponding to ages 9, 45, 70, 45, 26, and 9 with mean = 34.0.

8. 61937, 90217, 56708, 35351, 60820, and 90729 corresponding to patients D, F, D, C, D, and F, corresponding to ages 70, 9, 70, 45, 70, and 9 with mean = 45.5.

9. 94551, 69538, 52924, 08530, 79302, and 34981 corresponding to patients F, E, D, A, D, and C, corresponding to ages 9, 32, 70, 26, 70, and 45 with mean = 42.0

10. 79385, 49498, 48569, 57888, 70564, and 17660 corresponding to patients E, C, C, D, E, and B, corresponding to ages 32, 45, 45, 70, and 17 with mean = 34.8333.

The bootstrap mean is (24.6667 + 45.8333 + 31.8333 + 28.0 + 32.1667 + 27.1667 + 34.0 + 45.5 + 42.0 + 34.8333)/10 = 31.8833. This is to be compared to the original sample mean of 33.1667. Recall from Section 2.4 that the population consisting of patients A, B, C, D, E, and F represents our original sample for the bootstrap. We determined that the mean age for that sample was 33.1667. We would have obtained greater accuracy if we had generated 50 to 100 bootstrap samples rather than just 10. Had we generated all 46,656 possible distinct bootstrap samples, we would have calculated the sample mean exactly.