When conducting an experiment or a clinical trial, cost is an important practical consideration.
SAMPLE SIZE DETERMINATION FOR CONFIDENCE INTERVALS
When conducting an experiment or a clinical trial,
cost is an important practical consideration. Often, the number of tests in an
engineering experiment or the num-ber of patients enrolled in a clinical trial
has a major impact on the cost of the ex-periment or trial. We have seen that
the variance of the sample mean decreases by a factor of 1/n with an increase in the sample size from 1 to n. This statement implies that in order
to obtain precise confidence intervals for the population mean, the larger the
sample the better.
But, because of the cost constraints, we may need
to trade off precision of our es-timate with the cost of the test. Also, with
clinical trials, the number of patients who are enrolled can have a major
impact on the time it will take to complete the trial. Two of the main factors
that are impacted by sample size are precision and cost; thus, sample size also
affects the feasibility of a clinical trial.
The real question we must ask is: “How precise an
estimate do I need in order to have useful results?” We will show you how to
address this question in order to de-termine a minimum acceptable value for n. Once this minimum n is determined, we can see what this n implies about the feasibility of the
experiment or trial. In many epidemiological and other health-related studies,
sample size estimation is also of crucial importance. For example,
epidemiologists need to know the minimum sam-ple size required in order to
detect differences in occurrences of diseases, health conditions, and other
characteristics by subpopulations (e.g., smokers versus non-smokers), or in the
effects of different exposures or interventions.
In Chapter 9, we will revisit this issue from the
perspective of hypothesis testing. The issues in hypothesis testing are the
same and the methods of evaluation are very similar to those for sample size
estimation based on confidence interval width that we will now describe.
Let us first consider the simplest case of estimating a population mean when the variance σ2 is known. In Section 8.4, we saw that a 95% confidence interval is given by [ – 1.96σ/√n, + 1.96 σ /√n]. If we subtract the lower endpoint of the interval from the upper endpoint, we see that the width of the interval is + 1.96σ/√n – + 1.96σ/√n = 2(1.96σ/√n) or 3.92σ/√n.
The way we determine sample size is to put a
constraint on the width 3.92σ/√n or the half-width 1.96σ/√n. The half-width represents the greatest distance a point in the
interval can be away from the point estimate. So it is a meaningful quantity to
constrain. When the main objective is an accurate confidence interval for the
parameter the half-width of the interval is a very natural choice. Other
objectives such as power of a statistical test can also be used. We specify a
maximum value d for this half-width.
The quantity d is very much dependent
on what would be a mean-ingful interval in the particular trial or experiment.
Requiring the half-width to be no larger than d leads to the inequality 1.96 σ/√n ≤ d. Using
algebra, we see that n ≥ 1.96σ/d or n ≥ 3.8416 σ2/d2. To meet this requirement with the smallest possible integer n, we calculate the quantity 3.8416 σ2/d2 and let n be the next
inte-ger larger than this quantity. Display 8.7 summarizes the sample size
formula using the half-width d of a
confidence interval.
Display 8.7. Sample Size Formula Using the Half-Width d of a Confidence Interval
Take n as the next integer larger than (C)2σ2/d2; e.g., for the 95% confidence interval for the mean, take n as the next integer larger than (1.96)2σ2/d2.
Let us consider the case where we are sampling from
a normal distribution with a known standard deviation of 5, and let us assume
that we want the half-width of the 95% confidence interval to be no greater
than 0.5. Then d = 0.5 and σ = 5 in this case. Now the
quantity 3.8416 σ2/d2 is
3.8416(5/0.5)2 = 3.8416 (10)2 = 3.8416(100) = 384.16. So
the smallest integer n that satisfies
the required inequality is 385.
In order to solve the foregoing problem we needed
to know σ, which in most practical situations will be unknown. Our alternatives
are to find or guess at an up-per bound for σ, to estimate σ from a small pilot
study, or to refer to the literature for studies that may publish estimates of
σ.
Estimating the sample size for the difference
between two means is a problem similar to estimating the sample size for a
single mean but requires knowing two variances and specifying a relationship
between the two sample sizes nt
and nc.
Recall from Section 8.6 that the 95% confidence
interval for the difference be-tween two means of samples selected from two independent
normal distributions with known and equal variances is given by .
The half-width of this interval is 1.96 σ √[(1/nt)
+ (1/nc)]. Assume nt = knc for some
proportionality constant k ≥ 1. The proportionality constant k adjusts for the differences in sample
sizes used in the treatment and control groups, as explained in the next
paragraph. Let d be the constraint on the half-width. The inequality becomes
1.96 σ √{1/(knc)} + {1/(nc)} = 1.96s √{1/(knc)} + {1/(nc)} = 1.96 σ √[(k + 1)/(knc)] ≤ d or knc/(k + 1) ≥ 3.8416 σ2/d2
or nc ≥ 3.8416(k + 1)σ2/(kd2). If nc
= 3.8416 (k + 1)σ2/(kd2), then nt = knc = 3.8416 (k
+ 1)σ2/d2. In Display 8.8 we present the sample size formula
using the half-width d of a
confidence interval for the difference between two population means.
Note that if k
= 1, then nc = nt = 3.8416 (2σ2/d2). Taking k
greater than 1 increases nt
while it lowers nc, but
the total sample size nt +
nc = (k + 1)2 3.8416 σ2/(kd2).
Display 8.8. Sample Size Formula Using the Half-Width d of a Confidence Interval (Difference
Between Two Population Means When the Sample Sizes Are n and kn, where k > 1)
Take n as the next integer larger than (C)2(k + 1)σ2/(kd2); e.g., for the 95% confi-dence interval for the mean, take n as the next integer larger than (1.96)2(k + 1) σ2/(kd2).
For k
> 1, the result is larger than 4 (3.8416σ2/d2), the result for k = 1 [since
(1 + 1)2 = 4]. This calculation shows without loss of generality
that k = 1 minimizes the total sample
size. However, in clinical trials there may be ethical reasons for wanting nt to be larger than nc.
For example, in 1995 Chernick designed a clinical
trial (the Tendril DX study) to show that steroid eluting pacing leads were
effective in reducing capture thresholds for patients with pacemakers. (For
more details, see Chernick, 1999, pp. 63–67). Steroid eluting leads have
steroid in the tip of the lead that slowly oozes out into the tissue. This
medication is intended to reduce inflammation. The capture threshold is the
minimum required voltage for the electrical shock from the lead into the heart
that causes the heart to contract (a forced pacing beat). Lower capture
thresholds conserve the pacemaker battery and thus allow a longer period before
replacement of the pacemaker. The pacing leads are connected from a pacemaker
that is implant-ed in the patient’s chest and run through part of the
circulatory system into the heart where they provide an electrical stimulus to
induce pacing heart beats (beats that re-store normal heart rhythm).
The investigator chose a value of k = 3 for the study because competitors
had demonstrated reductions in capture thresholds for their steroid leads that
were ap-proved by the FDA based on similar clinical trials. Factors for k such as 2 and 3 were considered
because the company and the investigating physicians wanted a much greater
percentage of the patients to receive the steroid leads but did not want k to be so large that the total number
of patients enrolled would become very expen-sive. Consequently, the physicians
who were willing to participate in the trial want-ed to give the steroid leads
to most of their patients, as they perceived it to be the better treatment than
the use of leads without the steroid.
Chernick actually planned the Tendril DX trial
(assuming thresholds were nor-mally distributed) so that he could reject the
null hypothesis of no difference in cap-ture threshold versus an alternative
hypothesis (i.e., that the difference was at least 0.5 volts with statistical
power of 80% as the alternative). In Chapter 9, when we consider sample size
for hypothesis testing, we will look again at these assumptions (e.g.,
statistical power) and requirements.
For now, to illustrate sample size calculations
based on confidence intervals, let us assume that we want the half-width of a
95% confidence interval for the mean difference to be no greater than d = 0.2 volts. Assume that both leads
have the same standard deviation of 0.8 volts. Then, since nt = 3.8416 [(k
+ 1)σ2/d2] = 3.8416[4(0.64/0.04)] = 245.86 or 246 (rounding
to the next integer) and nc
= nt/3 = 82, this gives a
total sample size of 328.
Without changing assumptions, suppose we were able
to let k = 1. Then nt = nc = 3.8416[2σ2/d2] =
3.8416[2(0.64/0.04)] = 122.93 or 123. This modification gives a much smaller
total sample size of 246. Note that by going to a 3:1 randomization scheme
(i.e., k = 3), nt increased by a factor of 2 or a total of 123, while nc decreased by only 41. We
call it a 3:1 randomization scheme because the probability is 0.75 that a
patient will receive the steroid lead and 0.25 that a patient will receive the
nonsteroid lead.
Formulae also can be given for more complex
situations. However, in some cases iterative procedures by computer are needed.
Currently, there are a number of soft-ware packages available to handle
differing confidence sets and hypothesis testing problems under a variety of
assumptions. We will describe some of these software packages in Section 16.3.
See the related references in Section 8.12 and Section 16.5.
Related Topics
TH 2019 - 2025 pharmacy180.com; Developed by Therithal info.