In general, we can choose a random sample of size n with replacement from a population of size N.
HOW TO SELECT A BOOTSTRAP SAMPLE
The bootstrap method and its use in statistical
inference will be covered more extensively in Chapter 8 when we discuss its
application in estimation and contrast it to parametric methods. In most
applications, a sampling procedure is used to approximate the bootstrap
method. That sampling procedure generates what are called bootstrap samples,
which are obtained by sampling with replacement. Because sampling with
replacement is a general sampling technique that is similar to ran-dom sampling,
we introduce it here.
In general, we can choose a random sample of size n with replacement from a population of
size N. In our applications of the
bootstrap, the population for boot-strap sampling will not be the actual
population of interest but rather a given, presumably random, sample from the
population.
In the first stage of selecting a bootstrap sample,
we take the interval [0, 1] and divide it into N equal parts. Then, for uniform random number U, we assign index 1 if 0 ≤ U < 1/N, and index 2 if 1/N ≤ U <
2/N, and so on until we assign index N if (N – 1)/N ≤ U <
1. We generate n such indices by
generating n consecutive uniform
random numbers. The procedure is identical to our rejection sampling scheme
ex-cept that none of the samples is rejected because repeated indices are
allowed.
Bootstrap sampling is a special case of sampling
with replacement. In ordinary bootstrap sampling, n = N. Remember, for
bootstrap sampling the population size N
is actually the size of the original random sample; the true population is
replaced by that sample.
Let us consider the population of six patients
described previously in Section 2.4. Again, age is the variable of interest. We
will generate 10 bootstrap samples of size six for the ages of the patients.
For the first sample we will use row 3 from Table 2.1. The second sample will
be generated using row 4, and so on for samples 3 through 10.
The first six uniform random numbers in row 3 are
69386, 71708, 88608, 67251, 22512, and 00169. The corresponding indices are 5, 5,
6, 5, 2, and 1. The corre-sponding patients are E, E, F, E, B, and A, and the
sampled ages are 32, 32, 9, 32, 17, and 26. The average age for this bootstrap
sample is 24.6667.
There are 66 = 46,656 possible bootstrap
samples of size six. In practice, we sample only a small number, such as 50 to
100, when the total number of possible samples is so large. A random selection
of 100 samples provides a good estimate of the bootstrap mean obtained from
averaging the 46,656 bootstrap samples.
It is also true that the bootstrap sample mean is
an unbiased estimate of the pop-ulation mean for the following reason: For any
random sample, the bootstrap sam-ple estimate is an unbiased estimate of the
mean of the random sample, and the mean of the random sample is an unbiased
estimate of the population mean.
We will determine all ten bootstrap samples,
calculate their sample means, and see how close the average of the ten
bootstrap sample means is to the population mean age. Note that although the
bootstrap provides an unbiased estimate of the population mean, we can
demonstrate this result only by averaging all 46,656 boot-strap samples.
Obviously, this calculation is difficult, so we will approximate only the mean
of the original sample by averaging the ten bootstrap samples. We expect the
result to be close to the mean of the original sample.
The 10 bootstrap samples are as
follows:
1. 69386, 71708, 88608, 67251,
22512, and 00169 corresponding to patients E, E, F, E, B, and A and ages 32,
32, 9, 32, 17, and 26 with mean = 24.6667.
2. 68381, 61725, 49122, 75836,
15368, and 52551 corresponding to patients E, D, C, E, A, and D, corresponding
to ages 32, 70, 45, 32, 26, and 70 with mean = 45.8333.
3. 69158, 38683, 41374, 17028,
09304, and 10834 corresponding to patients E, C, C, B, A, and A, corresponding
to ages 32, 45, 45, 17, 26, and 26 with mean = 31.8333.
4. 00858, 04352, 17833, 41105,
46569, and 90109 corresponding to patients A, A, B, C, C, and F, corresponding
to ages 26, 26, 17, 45, 45, and 9 with mean = 28.0.
5. 86972, 51707, 58242, 16035,
94887, and 83510 corresponding to patients F, D, D, A, F, and F, corresponding
to ages 9, 70, 70, 26, 9, and 9 with mean = 32.1667.
6. 30606, 45225, 30161, 07973,
03034, and 82983 corresponding to patients B, C, B, A, A, and E, corresponding
to ages 17, 45, 17, 26, 26, and 32 with mean = 27.1667.
7. 93864, 49044, 57169, 43125,
11703, and 87009 corresponding to patients F, C, D, C, A, and F, corresponding
to ages 9, 45, 70, 45, 26, and 9 with mean = 34.0.
8. 61937, 90217, 56708, 35351,
60820, and 90729 corresponding to patients D, F, D, C, D, and F, corresponding
to ages 70, 9, 70, 45, 70, and 9 with mean = 45.5.
9. 94551, 69538, 52924, 08530,
79302, and 34981 corresponding to patients F, E, D, A, D, and C, corresponding
to ages 9, 32, 70, 26, 70, and 45 with mean = 42.0
10. 79385, 49498, 48569, 57888,
70564, and 17660 corresponding to patients E, C, C, D, E, and B, corresponding
to ages 32, 45, 45, 70, and 17 with mean = 34.8333.
The bootstrap mean is (24.6667 + 45.8333 + 31.8333
+ 28.0 + 32.1667 + 27.1667 + 34.0 + 45.5 + 42.0 + 34.8333)/10 = 31.8833. This
is to be compared to the original sample mean of 33.1667. Recall from Section
2.4 that the population consisting of patients A, B, C, D, E, and F represents
our original sample for the bootstrap. We determined that the mean age for that
sample was 33.1667. We would have obtained greater accuracy if we had
generated 50 to 100 bootstrap samples rather than just 10. Had we generated all
46,656 possible distinct bootstrap samples, we would have calculated the sample
mean exactly.
Related Topics
TH 2019 - 2025 pharmacy180.com; Developed by Therithal info.