9 Survival plots – HVTI ggplot graphics recipes

9.1 When to use it

The Kaplan-Meier curve is the workhorse of every CORR outcomes paper. Reach for it whenever the question is how long until an event and some patients have not had the event yet by the end of follow-up: death, reoperation, readmission. That last part is what makes survival analysis its own method: a patient who is alive at last contact is right-censored, not a non-event, and the product-limit estimator is what lets those partial observations still count.

hv_survival() (Ehrlinger 2026), built on the survival package (Therneau 2026), estimates the product-limit survival function once and stores everything five related panels need (the survival curve, cumulative hazard, hazard rate, log-log check, and integrated survivorship), plus tidy summary tables in $tables. If you have used the SAS %kaplan macro, these are the same five plots it draws. You build the S3 object once and call plot() with a type argument to pull out the panel you want; each one comes back as a bare ggplot you finish with the usual +.

9.2 The data it needs

hv_survival() expects long-format data: one row per patient with a follow-up time and an event indicator (1 = event, 0 = censored). Nothing else is required for an unstratified curve; add a grouping column when you want curves per arm. sample_survival_data() builds a simulated right-censored cohort in exactly this shape, so the recipe runs end to end before you point it at your own data.

dta_km <- sample_survival_data(n = 500, seed = 42)
head(dta_km)

    iv_dead  dead iv_opyrs age_at_op
1  3.966736  TRUE 2003.503  58.08251
2 13.217905  TRUE 2008.716  65.37466
3  5.669821  TRUE 1990.072  58.80676
4  0.763838  TRUE 2005.739  66.45205
5  9.463533  TRUE 2006.139  75.84440
6 20.000000 FALSE 1991.461  60.21944

# Build the S3 object once; reuse it across every panel below.
km <- hv_survival(dta_km)

9.3 Build it

Start from the bare panel so you can see what the constructor produced before any styling. The default type = "survival" is the curve with a logit-transform 95% confidence ribbon.

plot(km)

Then layer on the house style. The pattern below is the one we use most often in CORR: a single steelblue for a one-cohort figure, percent labels on the y-axis, an n = callout in the lower-left, and the manuscript theme. Swap in theme_hv_poster() when you want the larger, poster-sized text.

plot(km, alpha = 0.8) +
  scale_color_manual(values = c(All = "steelblue"), guide = "none") +
  scale_fill_manual(values  = c(All = "steelblue"), guide = "none") +
  scale_y_continuous(breaks = seq(0, 100, 20),
                     labels = function(x) paste0(x, "%")) +
  scale_x_continuous(breaks = seq(0, 20, 5)) +
  coord_cartesian(xlim = c(0, 20), ylim = c(0, 100)) +
  labs(x = "Years after Operation", y = "Freedom from Death (%)",
       title = "Overall Survival") +
  annotate("text", x = 1, y = 5, label = paste0("n = ", nrow(dta_km)),
           hjust = 0, size = 3.5) +
  theme_hv_manuscript()

Figure 9.1: Kaplan-Meier overall survival for the simulated cohort, styled in the CORR house style with a logit-transform 95% confidence ribbon

9.4 Read it

A Kaplan-Meier curve starts at 100% and steps downward at each event time; it never rises. A few things to look at:

The ribbon widens to the right. As the at-risk population thins, each remaining event moves the estimate more, so the confidence band fans out. Be cautious reading the far-right tail, which may rest on a handful of patients.
Median survival is where the curve crosses 50%. If it never crosses, the median is not reached within follow-up and you report a different landmark (e.g. 5-year survival) instead.
The numbers at risk below the figure tell the reader how much data backs each region of the curve. Always show them.

The at-risk strip and the per-time-point report (point estimates with CIs) are stored on the object, so you never recompute them:

km$tables$risk

  strata report_time n.risk
1    All           1    478
2    All           5    412
3    All          10    322
4    All          15    260
5    All          20    207
6    All          25    207

km$tables$report

  strata report_time  surv     lower     upper n.risk n.event
1    All           1 0.954 0.9317320 0.9692443    478       1
2    All           5 0.822 0.7859710 0.8530976    412       1
3    All          10 0.642 0.5989814 0.6828473    322       1
4    All          15 0.518 0.4741762 0.5615486    260       1
5    All          20 0.414 0.3715880 0.4577269    207       0
6    All          25 0.414 0.3715880 0.4577269    207       0

9.5 Numbers at risk

The “always show them” advice has a one-call answer. hv_atrisk() turns the object’s risk table into an aligned panel, and hv_atrisk_compose() stacks it under the curve, matching the table’s x-range to the plot’s so the counts sit beneath the time points they describe. Style the curve as usual, build the table from the same object, and compose the two.

curve <- plot(km, alpha = 0.8) +
  scale_color_manual(values = c(All = "steelblue"), guide = "none") +
  scale_fill_manual(values  = c(All = "steelblue"), guide = "none") +
  scale_y_continuous(breaks = seq(0, 100, 20),
                     labels = function(x) paste0(x, "%")) +
  scale_x_continuous(breaks = seq(0, 20, 5)) +
  coord_cartesian(xlim = c(0, 20), ylim = c(0, 100)) +
  labs(x = "Years after Operation", y = "Freedom from Death (%)") +
  theme_hv_manuscript()

hv_atrisk_compose(curve, hv_atrisk(km))

Figure 9.2: Survival curve with an aligned numbers-at-risk table composed beneath the time axis

The counts under the axis are the same km$tables$risk numbers from above, now positioned at the curve’s own time ticks. A stratified figure gives the table one row per group, labelled to match the curves.

9.6 Variations

9.6.1 Stratified by group

To compare arms, pass group_col and the constructor fits a separate curve per stratum. Here we simulate a two-arm valve-type cohort with a 1.4x hazard ratio. Look for clearly separated curves whose CI ribbons stop overlapping in the windows where the treatment difference matters.

dta_km_s <- sample_survival_data(
  n             = 500,
  strata_levels = c("Type A", "Type B"),
  hazard_ratios = c(1, 1.4),
  seed          = 42
)
km_s <- hv_survival(dta_km_s, group_col = "valve_type")

plot(km_s, alpha = 0.8) +
  scale_color_manual(
    values = c("Type A" = "steelblue", "Type B" = "firebrick"),
    name   = "Valve Type"
  ) +
  scale_fill_manual(
    values = c("Type A" = "steelblue", "Type B" = "firebrick"),
    name   = "Valve Type"
  ) +
  scale_y_continuous(breaks = seq(0, 100, 20),
                     labels = function(x) paste0(x, "%")) +
  scale_x_continuous(breaks = seq(0, 20, 5)) +
  coord_cartesian(xlim = c(0, 20), ylim = c(0, 100)) +
  labs(x = "Years after Operation", y = "Freedom from Death (%)",
       title = "Survival by Valve Type") +
  theme_hv_manuscript()

Figure 9.3: Stratified survival curves for a two-arm valve-type cohort simulated with a 1.4x hazard ratio

9.6.2 Cumulative hazard

type = "cumhaz" gives the Nelson-Aalen cumulative hazard H(t) = -log S(t), a monotonically non-decreasing curve whose slope at any point is the instantaneous hazard rate. Same object, different type.

plot(km, type = "cumhaz") +
  scale_x_continuous(breaks = seq(0, 20, 5)) +
  labs(x = "Years after Operation", y = "Cumulative Hazard H(t)",
       title = "Nelson-Aalen Cumulative Hazard") +
  scale_color_manual(values = c(All = "steelblue"), guide = "none") +
  theme_hv_manuscript()

Figure 9.4: Nelson-Aalen cumulative hazard, the monotonically non-decreasing complement of the survival curve

9.6.3 Hazard rate

type = "hazard" is the instantaneous hazard. The raw point estimates are noisy, so for a figure overlay a LOESS smooth. The smoothed curve shows the early-risk peak and later plateau a reviewer is looking for.

plot(km, type = "hazard") +
  geom_smooth(aes(x = mid_time, y = hazard, color = strata),
              method = "loess", se = FALSE, span = 0.6) +
  scale_color_manual(values = c(All = "steelblue"), guide = "none") +
  scale_x_continuous(breaks = seq(0, 20, 5)) +
  labs(x = "Years after Operation", y = "Instantaneous Hazard",
       title = "Hazard Rate") +
  theme_hv_manuscript()

Figure 9.5: Instantaneous hazard rate with a LOESS smooth showing the early-risk peak and later plateau

9.6.4 Log-log survival (proportional-hazards check)

type = "loglog" plots log(-log S(t)) against log(time). Parallel lines across strata mean the proportional-hazards assumption is reasonable. Run this check before you quote a single hazard ratio. Fit with group_col so there are strata to compare.

plot(km_s, type = "loglog") +
  scale_color_manual(
    values = c("Type A" = "steelblue", "Type B" = "firebrick"),
    name   = "Valve Type"
  ) +
  labs(x = "log(Years after Operation)", y = "log(-log S(t))",
       title = "Log-Log Survival") +
  theme_hv_manuscript()

Figure 9.6: Log-log survival plot for checking the proportional-hazards assumption across strata

9.6.5 Integrated life table (restricted mean survival)

type = "life" is the integrated survivorship: the area under the survival curve up to time t, equal to the restricted mean survival time (RMST) at that horizon. It rises linearly while S(t) is near 1 and bends as events accumulate. RMST is the summary to reach for when the log-log check above shows hazards are not proportional and a single hazard ratio would mislead.

plot(km, type = "life") +
  scale_color_manual(values = c(All = "steelblue"), guide = "none") +
  scale_x_continuous(breaks = seq(0, 20, 5)) +
  labs(x = "Years after Operation",
       y = "Restricted Mean Survival (years)",
       title = "Integral of Survivorship") +
  theme_hv_manuscript()

Figure 9.7: Integrated survivorship, the restricted mean survival time accumulated up to each horizon

9.7 Pitfalls

Event coding. event must be 1 for the event and 0 for censored. Flip them and you estimate freedom from being censored, which is meaningless. Check the coding before you trust the curve.
Reading the tail. The right end of the curve often rests on very few patients; the widening ribbon is the warning. Quote a landmark estimate (with its at-risk count) rather than the visual endpoint.
Non-proportional hazards. If the log-log curves cross, a single hazard ratio is the wrong summary – report RMST from the integrated panel instead.