10  Hazard and nonparametric curves

10.1 When to use it

The survival chapter built its curves from raw patient times. This chapter is for the figures where the modelling is already done and you hold a table of fitted values: a parametric model has produced a smooth survival, hazard, or cumulative-hazard estimate at a grid of time points, complete with confidence bounds. Three related plotting functions from hvtiPlotR (Ehrlinger 2026) turn those tables into figures.

Reach for hv_hazard() when you have a parametric prediction grid and want to show the smooth curve, often with the empirical Kaplan-Meier points overlaid so a reviewer can judge how well the model fits. Reach for the nonparametric temporal-trend curve (hv_nonparametric()) when your outcome is a prevalence or a continuous measurement tracked over follow-up with a bootstrap confidence band rather than a survival function. Reach for the ordinal curve (hv_ordinal()) when the outcome has graded levels (regurgitation grade, severity class) and you want one probability curve per grade. All three take pre-computed data and hand back a bare ggplot you decorate the usual way.

10.2 The data it needs

hv_hazard() takes a parametric prediction grid: one row per time point, with an estimate column (survival, hazard, or cumhaz) and explicit lower/upper CI columns that you name in the call. Two helpers generate the demonstration data. sample_hazard_data() builds the prediction grid; sample_hazard_empirical() builds the binned Kaplan-Meier overlay (the discrete empirical estimate the smooth curve is meant to track). Using the same cohort size for both keeps the overlay aligned with the curve.

dat_hp <- sample_hazard_data(n = 500, time_max = 10)
emp_hp <- sample_hazard_empirical(n = 500, time_max = 10, n_bins = 6)
head(dat_hp)
        time survival surv_lower surv_upper    hazard haz_lower haz_upper
1 0.01000000 99.99558   99.93731        100 0.6629126 0.3996474  1.099602
2 0.03002004 99.97702   99.84413        100 1.1485818 0.6924408  1.905203
3 0.05004008 99.95054   99.75561        100 1.4829116 0.8939968  2.459770
4 0.07006012 99.91808   99.66720        100 1.7546549 1.0578216  2.910523
5 0.09008016 99.88059   99.57770        100 1.9896233 1.1994760  3.300275
6 0.11010020 99.83868   99.48662        100 2.1996335 1.3260840  3.648628
       cumhaz cumhaz_lower cumhaz_upper
1 0.004419417            0    0.5871202
2 0.022986980            0    1.3519229
3 0.049470012            0    1.9990187
4 0.081954223            0    2.5912321
5 0.119483723            0    3.1493081
6 0.161453396            0    3.6834317

10.3 Build it

The most common figure is a smooth parametric survival curve with its confidence band plus discrete Kaplan-Meier points and error bars. You name the estimate column and its two CI columns, then pass the empirical data frame and its CI columns to draw the overlay. Start from the decorated version since the constructor has no separate bare mode here; the call below is the whole figure.

plot(hv_hazard(
  dat_hp,
  estimate_col  = "survival",
  lower_col     = "surv_lower",
  upper_col     = "surv_upper",
  empirical     = emp_hp,
  emp_lower_col = "lower",
  emp_upper_col = "upper"
)) +
  scale_colour_manual(values = c("steelblue"), guide = "none") +
  scale_fill_manual(values = c("steelblue"), guide = "none") +
  scale_x_continuous(limits = c(0, 10), breaks = 0:10) +
  scale_y_continuous(limits = c(0, 100), breaks = seq(0, 100, 20),
                     labels = function(x) paste0(x, "%")) +
  labs(x = "Years", y = "Survival (%)") +
  theme_hv_manuscript()
Figure 10.1: Smooth parametric survival curve with its confidence band and overlaid binned Kaplan-Meier points with error bars

10.4 Read it

The point of the overlay is the comparison between the smooth curve and the discrete points. Look for:

  • Points sitting on the curve. If the empirical Kaplan-Meier estimates fall on or very near the parametric line, the model is describing the data well. If the points wander systematically off the curve (above it early, below it late), the parametric form is the wrong shape and you should reconsider it.
  • The error bars against the band. The empirical error bars and the smooth confidence ribbon should tell a consistent story. Where the cohort thins, both widen; a point with a huge error bar is a region the figure cannot really support.
  • The curve shape itself. A survival curve falls and flattens; a hazard curve often peaks early then settles; a cumulative hazard only rises. A shape that contradicts the biology is a signal to check the model before the figure.

10.5 Variations

10.5.1 Hazard rate

Switch estimate_col to "hazard" and the matching CI columns for the instantaneous hazard rate, expressed as percent per year. This is the curve to show when the question is when the risk is highest rather than how many survive.

plot(hv_hazard(
  dat_hp,
  estimate_col = "hazard",
  lower_col    = "haz_lower",
  upper_col    = "haz_upper"
)) +
  scale_colour_manual(values = c("firebrick"), guide = "none") +
  scale_fill_manual(values = c("firebrick"), guide = "none") +
  scale_x_continuous(limits = c(0, 10), breaks = 0:10) +
  scale_y_continuous(limits = c(0, 30),
                     labels = function(x) paste0(x, "%/yr")) +
  labs(x = "Years", y = "Hazard (%/year)") +
  theme_hv_manuscript()
Figure 10.2: Instantaneous hazard rate in percent per year, showing when post-operative risk is highest

10.5.2 Cumulative hazard

The "cumhaz" column equals -log(S) * 100. It rises monotonically and its slope at any point is the instantaneous hazard. We reach for it in readmission and repeated-event analyses, where counting accumulated risk is more natural than reporting a survival fraction.

plot(hv_hazard(
  dat_hp,
  estimate_col  = "cumhaz",
  lower_col     = "cumhaz_lower",
  upper_col     = "cumhaz_upper"
)) +
  scale_colour_manual(values = c("darkorange"), guide = "none") +
  scale_fill_manual(values = c("darkorange"), guide = "none") +
  scale_x_continuous(limits = c(0, 10), breaks = 0:10) +
  labs(x = "Years", y = "Cumulative Hazard (%)") +
  theme_hv_manuscript()
Figure 10.3: Cumulative hazard rising monotonically, with its slope giving the instantaneous hazard

10.5.3 Stratified by group

Pass group_col to compare two or more groups in one panel, each with its own curve, band, and empirical overlay. Look for curves whose bands stop overlapping in the windows where the treatment difference matters.

dat_strat <- sample_hazard_data(
  n = 400, time_max = 10,
  groups = c("No Takedown" = 1.0, "Takedown" = 0.65)
)
emp_strat <- sample_hazard_empirical(
  n = 400, time_max = 10, n_bins = 6,
  groups = c("No Takedown" = 1.0, "Takedown" = 0.65)
)

plot(hv_hazard(
  dat_strat,
  estimate_col  = "survival",
  lower_col     = "surv_lower",
  upper_col     = "surv_upper",
  group_col     = "group",
  empirical     = emp_strat,
  emp_lower_col = "lower",
  emp_upper_col = "upper"
)) +
  scale_colour_manual(
    values = c("No Takedown" = "steelblue", "Takedown" = "firebrick"),
    name   = NULL
  ) +
  scale_fill_manual(
    values = c("No Takedown" = "steelblue", "Takedown" = "firebrick"),
    guide  = "none"
  ) +
  scale_x_continuous(limits = c(0, 10), breaks = 0:10) +
  scale_y_continuous(limits = c(0, 100), breaks = seq(0, 100, 20),
                     labels = function(x) paste0(x, "%")) +
  labs(x = "Years after Surgery", y = "Survival (%)") +
  theme_hv_manuscript()
Figure 10.4: Survival curves, bands, and empirical overlays for two groups in a single panel

10.5.4 Population life-table overlay

For age-stratified survival you often want to set the study curves against what the general population would do. Pass a life-table data frame to reference and set ref_group_col to draw the population survival as dashed lines per age group. The gap between a study curve and its dashed reference is the excess mortality attributable to the condition rather than to ageing.

dat_age <- sample_hazard_data(
  n = 600, time_max = 12,
  groups = c("<65" = 0.5, "65–80" = 1.0, "≥80" = 1.8)
)
emp_age <- sample_hazard_empirical(
  n = 600, time_max = 12, n_bins = 6,
  groups = c("<65" = 0.5, "65–80" = 1.0, "≥80" = 1.8)
)
lt <- sample_life_table(
  age_groups = c("<65", "65–80", "≥80"),
  age_mids   = c(55, 72, 85),
  time_max   = 12
)

plot(hv_hazard(
  dat_age,
  estimate_col     = "survival",
  lower_col        = "surv_lower",
  upper_col        = "surv_upper",
  group_col        = "group",
  empirical        = emp_age,
  emp_lower_col    = "lower",
  emp_upper_col    = "upper",
  reference        = lt,
  ref_estimate_col = "survival",
  ref_group_col    = "group"
)) +
  scale_colour_manual(
    values = c("<65" = "steelblue", "65–80" = "forestgreen",
               "≥80" = "firebrick"),
    name   = "Age group"
  ) +
  scale_fill_manual(
    values = c("<65" = "steelblue", "65–80" = "forestgreen",
               "≥80" = "firebrick"),
    guide  = "none"
  ) +
  scale_x_continuous(limits = c(0, 12), breaks = seq(0, 12, 2)) +
  scale_y_continuous(limits = c(0, 100), breaks = seq(0, 100, 20),
                     labels = function(x) paste0(x, "%")) +
  labs(x = "Years", y = "Survival (%)",
       caption = "Dashed lines: US population life table") +
  theme_hv_manuscript()
Figure 10.5: Age-stratified study survival curves set against dashed population life-table references

10.5.5 Nonparametric temporal-trend curve

When the outcome is a prevalence or a continuous measurement tracked over follow-up rather than a survival function, hv_nonparametric() prepares the average curve with a bootstrap CI ribbon. sample_nonparametric_curve_data() generates the average curve and CI; sample_nonparametric_curve_points() generates the binned summary points that overlay it. Build the S3 object once with the matching lower_col/upper_col and optional data_points, then plot().

curve_dat <- sample_nonparametric_curve_data(
  n            = 500,
  time_max     = 12,
  outcome_type = "probability",
  ci_level     = 0.68
)
pts_dat <- sample_nonparametric_curve_points(n = 500, time_max = 12)

np <- hv_nonparametric(
  curve_data  = curve_dat,
  lower_col   = "lower",
  upper_col   = "upper",
  data_points = pts_dat
)

The bare panel shows the average curve, its CI ribbon, and the binned summary points in default colours. Read it the same way you read the empirical overlay in Figure 10.1: the curve should pass through the binned points, and the ribbon should widen at the extremes where the bootstrap sample thins. A flat curve where you expected a trend is a hint that the outcome_type or CI columns are misspecified.

plot(np)

The shaded ribbon here is the 68% bootstrap CI (one standard error); pass ci_level = 0.95 to sample_nonparametric_curve_data() for 95% bands.

plot(np) +
  scale_colour_manual(values = c("steelblue"), guide = "none") +
  scale_fill_manual(values   = c("steelblue"), guide = "none") +
  scale_x_continuous(limits = c(0, 12), breaks = seq(0, 12, 2),
                     labels = function(x) paste(x, "yr")) +
  scale_y_continuous(limits = c(0, 1), breaks = seq(0, 1, 0.2),
                     labels = scales::percent) +
  labs(x = "Follow-up (years)", y = "Prevalence (%)") +
  theme_hv_manuscript()
Figure 10.6: Nonparametric temporal-trend curve with its 68% bootstrap confidence ribbon and binned summary points

Pass group_col to the constructor to compare two average curves in a single panel; each group gets its own CI ribbon.

curve_grp <- sample_nonparametric_curve_data(
  n            = 400,
  time_max     = 7,
  groups       = c("Ozaki" = 0.7, "CE-Pericardial" = 1.3),
  outcome_type = "continuous",
  ci_level     = 0.68
)
pts_grp <- sample_nonparametric_curve_points(
  n = 400, time_max = 7,
  groups = c("Ozaki" = 0.7, "CE-Pericardial" = 1.3),
  outcome_type = "continuous"
)
np_grp <- hv_nonparametric(
  curve_data  = curve_grp,
  group_col   = "group",
  lower_col   = "lower",
  upper_col   = "upper",
  data_points = pts_grp
)

p_haz_grp <- plot(np_grp) +
  scale_colour_manual(
    values = c("Ozaki" = "steelblue", "CE-Pericardial" = "firebrick"),
    name   = NULL
  ) +
  scale_fill_manual(
    values = c("Ozaki" = "steelblue", "CE-Pericardial" = "firebrick"),
    guide  = "none"
  ) +
  scale_x_continuous(limits = c(0, 7), breaks = 0:7) +
  labs(x = "Follow-up (years)", y = "AV Peak Gradient (mmHg)") +
  theme_hv_manuscript()

# Rising curves leave the upper-left empty; pin the key there.
hv_legend_inside(p_haz_grp, prefer = "topleft")
Figure 10.7: Two nonparametric average curves compared in one panel, each with its own bootstrap confidence ribbon

10.5.6 Nonparametric ordinal-outcome curve

When the outcome has graded levels, hv_ordinal() prepares one probability curve per grade from a cumulative proportional-odds model (the prevalence of each regurgitation grade over time, say). The curve data is long format, one row per time by grade. sample_nonparametric_ordinal_data() generates the curves and sample_nonparametric_ordinal_points() the binned summary points.

ord_dat <- sample_nonparametric_ordinal_data(
  n = 800, time_max = 5,
  grade_labels = c("None", "Mild", "Moderate", "Severe")
)
ord_pts <- sample_nonparametric_ordinal_points(
  n = 800, time_max = 5,
  grade_labels = c("None", "Mild", "Moderate", "Severe")
)
head(ord_dat)
        time  estimate grade
1 0.01000000 0.6276258  None
2 0.01012532 0.6276898  None
3 0.01025221 0.6277545  None
4 0.01038069 0.6278200  None
5 0.01051078 0.6278864  None
6 0.01064250 0.6279535  None

Build the object with both curve data and summary points. Each line is one grade level; because every patient has exactly one grade at each time, the lines sum to roughly 1.0 at every time point. Read the figure as a stacked story: the "None" line starts high and declines as patients drift into worse grades, and the severe line creeps up. Colours run from grey for None through graduated severity to firebrick for Severe so the eye reads worsening as warming.

np_ord <- hv_ordinal(curve_data = ord_dat, data_points = ord_pts)

plot(np_ord) +
  scale_colour_manual(
    values = c("None" = "grey40", "Mild" = "steelblue",
               "Moderate" = "darkorange", "Severe" = "firebrick"),
    name = "AR Grade"
  ) +
  scale_x_continuous(limits = c(0, 5), breaks = 0:5) +
  scale_y_continuous(limits = c(0, 1), breaks = seq(0, 1, 0.2),
                     labels = scales::percent) +
  labs(x = "Follow-up (years)", y = "Grade prevalence (%)") +
  theme_hv_manuscript() +
  theme(legend.position = c(0.75, 0.6))
Figure 10.8: One probability curve per regurgitation grade over time, coloured from grey to firebrick as severity worsens

To collapse higher grades into a combined level (Moderate + Severe, say), subset and re-label before passing the data to the constructor. Here we show a two-grade version drawn from the same sample data.

ord_two <- subset(ord_dat, grade %in% c("None", "Severe"))

plot(hv_ordinal(curve_data = ord_two)) +
  scale_colour_manual(
    values = c("None" = "steelblue", "Severe" = "firebrick"),
    name   = NULL
  ) +
  scale_x_continuous(limits = c(0, 5), breaks = 0:5) +
  scale_y_continuous(limits = c(0, 1), breaks = seq(0, 1, 0.2),
                     labels = scales::percent) +
  labs(x = "Follow-up (years)", y = "Grade prevalence (%)") +
  theme_hv_manuscript()
Figure 10.9: Two-grade ordinal version with higher grades collapsed into a combined level

10.6 Pitfalls

  • Mismatched CI columns. Every estimate column has its own pair of CI columns. Plot hazard with the survival bounds and the band will be nonsense. Name the matching lower_col/upper_col for the column you are plotting.
  • Forgetting the empirical overlay. A smooth parametric curve always looks convincing. Without the Kaplan-Meier points a reviewer cannot tell whether the model fits, so include empirical whenever you have it.
  • Ordinal lines that do not sum to one. If your grade curves drift above 1.0 or cross oddly, the wide-format grade columns were probably reshaped to long format incorrectly. Each time point’s grades should add to about 1.0.
  • Over-reading the CI level. The nonparametric ribbon defaults to a 68% band (one standard error), which is narrower than the 95% band readers expect. State the level in the caption, or switch to ci_level = 0.95.