Survival Probabilities: The Kaplan–Meier Curve

Chapter: Biostatistics for the Health Sciences: Analysis of Survival Times

The Kaplan–Meier curve is a nonparametric estimate of the survival curve.

Survival Probabilities

The Kaplan–Meier Curve

The Kaplan–Meier curve is a nonparametric estimate of the survival curve (see Ka-plan and Meier, 1958). It is computed by using the same conditioning principle that we employed for the life table estimate in Section 15.2.2. Because the Kaplan–Meier curve is an estimator based on the products of conditional probabili-ties, it is also sometimes called the product-limit estimator.

The Kaplan–Meier curve starts out with S(t) = 1 for all t less than the first event time (such as a death at t₁). Then S(t₁) becomes S(0) (n₁ – d₁)/n₁, where n₁ is the number at risk at time t₁ and d₁ is the number who die at time t₁. Referring to Table 15.2 (column S_j, first row), S(t₁) = S(0) [(n₁ – d₁)/n₁] = 1[(10 – 1)/10] = 0.9. We sub-stitute N_j’ for n₁ in the formula. At the next time of death t₂, S(t₂) = S(t₁) (n₂ – d₂)/n₂, where n₂ and d₂ are, respectively, the corresponding number of patients at risk and deaths at time t₂. In Table 15.2 (second row), S(t₂) = S(t₁) [(n₂ – d₂)/n₂] = (0.9)[(8.5 –2)/8.5] = 0.688. The estimate S(t) stays constant at all times between events (i.e., deaths) but jumps down by the factor (n_j – d_j)/n_j at the time t_j of the jth deaths. You can verify this fact for the S_j column in Table 15.2. We allow for the possibility of more than one death at the same instant of time. The number at risk drops at with-drawal times as well as at the times of death. Thus, we use N_j’ instead of N_j to esti-mate n_j in the formula for S(t).

The Kaplan–Meier estimates can be portrayed in a table similar to the life table (Table 15.2), except that the intervals will be the times between events. Table 15.3 shows the Kaplan–Meier estimate for the patient data used in the previous section to construct a life table. Note that the column labels are essentially the same as those in Table 15.2, with the following two exceptions: (1) the column labeled “Av-erage Number at Risk, N_j’,” has been eliminated; and (2) the “Estimated Cumula-tive Survival” becomes S(t_j), a term that we defined in the foregoing paragraph.

In the row for t₁ under the column “Estimated Cumulative Survival” we obtain 0.9 by multiplying S₀ = 1 by p₁ = 0.9, where p₁ = 1 – q₁ and q₁ = D₁/N₁ = 1/10 = 0.1. In the row for t₂, q₂ = D₂/N₂ = 1/8 = 0.125. So p₂ = 1 – q₂ = 0.875 and, finally, S₂ = p2S1 = (0.875)(0.90) = 0.788. The remaining rows involve the same calculations and the recurrence relation Sk = pk Sk–1.

TABLE 15.3. Kaplan–Meier Survival Estimates for Patients in Table 15.2

Approximate confidence intervals for the Kaplan–Meier curve at specific time points can be obtained by using the Greenwood formula for the standard error of the estimate and a normal approximation for the distribution of the Kaplan–Meier esti-mate. A simpler estimate is obtained based on the results in the paper by Peto et al. (1977).

In Greenwood’s formula, Var(S_j) is estimated as V_j = S²_j[Σ_i^j₌₁q_i/(N_ip_i)]. Computationally, this is more easily calculated recursively as V_j = S²_j[q_j/(N_j p_j) + V_j_–1/S ²_j_–1], where we define S₀ = 1 and V₀ = 0.

Although the Greenwood formula is computationally easy using the recursion equation, the Peto approximation is much simpler. Peto’s estimate of variance is given by the formula W_j = S²_j(1 – S_j)/N_j. The simplicity of this formula is that it de-pends only on the survival probability estimate at time j and the number remaining at risk at time j, whereas Greenwood’s formula depends on survival probability es-timates, number at risk, and probability estimates of survival and death in preceding time intervals.

Peto’s estimate has a heuristic interpretation. If we ignore the censoring and think of failure by time j as a binomial outcome, to expect N_j patients to remain at time j we should have started with approximately N_j/S_j patients. Think of this num-ber (N_j/S_j) as an integer corresponding to the number of patients in a binomial ex-periment. Now the variance of a binomial proportion is p(1 – p)/n, where n is the sample size and p is the success probability. In our heuristic argument, S_j = p and N_j/S_j = n. So the variance is S_j(1 – S_j)/{N_j/S_j} = S²_j(1 – S_j)/N_j. We see that this variance is just Peto’s formula.

The square root of these variance estimates (Greenwood and Peto) is the corre-sponding estimate of the standard error of the Kaplan–Meier estimate S_j at time j. Approximate confidence intervals then are obtained through a normal approxima-tion that uses the normal distribution constants 1.96 for a two-sided 95% confidence interval or 1.645 for a two-sided 90% confidence interval. So the Greenwood 95% two-sided confidence interval at time j would be [S_j – 1.96 √V_j, S_j + 1.96 √V_j] and for Peto it would be [S_j – 1.96 √W_j, S_j + 1.96 √W_j]. Greenwood’s and Peto’s meth

Display 15.1. Greenwood’s Method for 95% Confidence Interval of Kaplan–Meier Estimate

[S_j – 1.96 √V_j, S_j + 1.96 √V_j]

where S_j = Kaplan–Meier survival probability estimate at the jth event time, and

V_j = S ²_j [ Σ^j_i₌₁q_i/(N_i p_i)]

where q_i is the probability of death in event interval i, p_i = 1 – q_i is the probability of surviving interval i, and N_i is the number of patients remaining at risk at the ith event time. Alternatively, V_j can be calculated by the recursion:

V_j = S²_j[q_j/(N_jp_j) + V_j_–1/S ²_j_–1]

ods are exhibited in Displays 15.1 and 15.2. Because we have used several approxi-mations, these confidence intervals are not exact, but only approximate.

Now we can construct 95% confidence intervals for our Kaplan–Meier estimates in Table 15.3. Let us compute the Greenwood and Peto intervals at time t₃ = 5.4. For the Greenwood method, we must determine V₃ first. We will do this using the recursive formula, first finding V₁, then V₂ from V₁, and finally V₃ from V₂. So V₁ = S₁²[q₁/(N₁p₁)] = (0.9)² [0.1/(10(0.9)] = 0.9 (0.01) = 0.009. Then V₂ = S²₂ [q₂/(N₂p₂) + V₁/S²₁] = (0.788)² [0.125/(8 (0.875)) + 0.009/(0.9)²] = 0.621 [0.125/7 + 0.009/0.81] 0.621(0.0179 + 0.0111) = 0.621(0.029) = 0.0180. Finally, V₃ = S²₃[q₃/(N₃ p₃) + V₂/S²₂] = (0.675)² [0.143/{7(0.857)} + 0.018/(0.788)²] = 0.4556 [0.143/6] = 0.0109. So the 95% confidence interval is [0.675 – 1.96 √0.0109, 0.675 + 1.96 √0.0109] = [0.675 –0.2046, 0.675 + 0.2046] = [0.4704, 0.8796].

For the Peto interval, W₃ is simply S²₃(1 – S₃)/N₃ = (0.675)²(0.325/7) = 0.4556

Display 15.2. Peto’s Method for 95% Confidence Interval of Kaplan–Meier Estimate

[S_j – 1.96 √W_j, S_j + 1.96 √W_j]

where S_j = Kaplan–Meier survival probability estimate at the jth event time, and

W_j = S²_j(1 – S_j)/N_j

where N_j is the number of patients remaining at risk at the jth event time.

(0.0464) = So the Peto interval is [0.675 – 1.96 √0.0212, 0.675 + 1.96 √0.0212] = [0.675 – 0.285, 0.675 + 0.285] = [0.390, 0.960]. Note that the Peto interval is wider and thus somewhat more conservative for the lower endpoint.

Some research [see Dorey and Korn (1987)] has shown that Peto’s method can give better lower confidence bounds than Greenwood’s, especially at long follow-up times in which the number of patients remaining at risk is small. The Greenwood interval tends to be too narrow in these situations; hence, the FDA sometimes rec-ommends using Peto’s method for the lower bound. We have seen how the Peto in-terval is wider than the Greenwood interval in the foregoing example. For more de-tails about the Kaplan–Meier curve and life tables, see Altman (1991) and Lawless (1982).

As we can see from the example in Table 15.3, the Kaplan–Meier curve gives re-sults similar to the life table method and is based on the same computational princi-ple. However, the Kaplan–Meier curve takes step decreases at the actual time of events (e.g., deaths), whereas the life table method makes the jumps at the end of the group intervals.

The Kaplan–Meier curve is preferred to the life table when all the event times are known precisely. For example, the Kaplan–Meier method does a better job than the life table when dealing with withdrawals when all withdrawals prior to an event (such as death) are removed in determining the number of patients at risk. In con-trast, the life table groups the events into time intervals; hence, it subtracts half the withdrawals in the interval in order to estimate the interval survival (or failure) probability.

However, there are many practical situations in which the event times are not known precisely but an interval for the event can be defined. For example, recur-rence of some event may be detected at follow-up visits, which could be scheduled every three months. All that is really known is that the recurrence occurred between the last two follow-up visits. So a life table with a three-month grouping may be more appropriate than a Kaplan–Meier curve in such cases.

Although survival curves are very useful, some difficulties occur when not all the events are reported. Lack of completeness in reporting events is a common problem that medical device companies confront when they report on the reliability of their products using Kaplan–Meier estimates from passive databases (i.e., data-bases that depend on voluntary reporting of problems). Such databases are notori-ous for underreporting events and overestimating performance as estimated in the survival curve. Techniques have been proposed to adjust these curves to account for biases. However, no proposal is free from potential problems. See Chernick, Poulsen, and Wang (2002) for a look at the problem of overadjustment with an al-gorithm that has been suggested for pacemakers.

<< Prev Page

Next Page >>

Survival Probabilities: The Kaplan–Meier Curve

Chapter: Biostatistics for the Health Sciences: Analysis of Survival Times

The Kaplan–Meier Curve

Insensitivity of Rank Tests to Outliers

Exercises questions answers

Introduction to Survival Times

Survival Probabilities

Survival Probabilities: Life Tables

Survival Probabilities: The Kaplan–Meier Curve

Survival Probabilities: Parametric Survival Curves

Survival Probabilities: Cure Rate Models

Comparing Two or More Survival Curves-The Log Rank Test

Exercises questions answers

General-Purpose Packages

Exact Methods

Sample Size Determination