11 Parametric hazard modeling with TemporalHazard

TemporalHazard (Ehrlinger 2026) is the team’s R port of the SAS/C HAZARD module that Blackstone and colleagues wrote for cardiac-surgery outcomes research. It does two jobs. The first is the nonparametric pair you already know from the survival chapter: Kaplan-Meier survival and the Nelson-Aalen cumulative hazard, both built on the survival package (Therneau 2026). The second is the part that makes the package worth its own chapter: a parametric multi-phase hazard fit, the classic early/constant/late decomposition of risk over time.

11.1 When to use it

A Kaplan-Meier curve answers “how many are still event-free at time t.” A hazard curve answers a different and often more clinically interesting question: “how dangerous is right now.” After a cardiac operation that instantaneous risk is rarely flat. It spikes in the first weeks (the surgical insult, the early failures), settles into a long low plateau, then climbs again years later as the cohort ages. Reach for a parametric hazard model when you want to see those phases as a smooth curve rather than the noisy step function a raw event count gives you, and when you want each phase summarised by a parameter you can report and compare across cohorts.

The other reason to reach for this package is lineage. For decades the team described risk with the SAS-HAZARD module, and reviewers in this field expect that early-peak, constant-phase, late-rise vocabulary. TemporalHazard keeps the same decomposition in R, so a figure you build here reads the way the reviewers already think. You fit the model once with hazard(), then ask it for survival, cumulative hazard, or the instantaneous hazard over whatever time grid you like.

11.2 The data it needs

Everything in this chapter runs on data bundled with the package, so the recipe works before you point it at your own extract. For the survival and hazard sections we use cabgkul: 5,880 patients with a follow-up time int_dead (interval to death, in years) and an event indicator dead (1 = death, 0 = censored). That is the same long format the survival chapter used, one row per patient with a time and a 0/1 event. The competing-risks section switches to omc, which records thromboembolic events and death on the same patients.

data(cabgkul, package = "TemporalHazard")
str(cabgkul)

'data.frame':   5880 obs. of  2 variables:
 $ int_dead: num  201.83 195.06 7.13 126.36 187.57 ...
 $ dead    : int  0 0 1 1 0 0 1 1 0 1 ...

11.3 Build it

11.3.1 Nonparametric estimates first

Before fitting anything parametric, look at the empirical curves. They are the yardstick every parametric fit is later checked against, and they make no distributional assumption. hzr_kaplan() returns a data frame (class hzr_kaplan) with one row per event time and the columns you need to draw a product-limit curve: time, n_risk, n_event, survival, the logit-transformed band cl_lower/cl_upper, and the derived cumhaz, hazard, density, and life.

km <- hzr_kaplan(cabgkul$int_dead, cabgkul$dead)
head(km)

Kaplan-Meier estimate with logit confidence limits
Events: 69  | Time points: 6 
Survival range: 0.9883 to 0.9934 
RMST at last event: 0.1957 

    time n_risk n_event n_censor survival std_err cl_lower cl_upper cumhaz
 0.03285   5880      39        0   0.9934  0.0011   0.9909   0.9952 0.0067
 0.06571   5841       9        0   0.9918  0.0012   0.9892   0.9938 0.0082
 0.09856   5832       3        0   0.9913  0.0012   0.9886   0.9934 0.0087
 0.13142   5829       7        0   0.9901  0.0013   0.9873   0.9924 0.0099
 0.16427   5822       9        0   0.9886  0.0014   0.9855   0.9910 0.0115
 0.19713   5813       2        0   0.9883  0.0014   0.9852   0.9907 0.0118
 hazard density   life
 0.2026  0.2019 0.0328
 0.0469  0.0466 0.0655
 0.0157  0.0155 0.0981
 0.0366  0.0362 0.1306
 0.0471  0.0466 0.1632
 0.0105  0.0104 0.1957

ggplot(km, aes(x = time, y = survival)) +
  geom_ribbon(aes(ymin = cl_lower, ymax = cl_upper), alpha = 0.2, fill = "steelblue") +
  geom_step(colour = "steelblue", linewidth = 0.7) +
  labs(x = "Years since operation", y = "Survival probability",
       title = "Kaplan-Meier survival (cabgkul)") +
  theme_hv_manuscript()

Figure 11.1: Empirical Kaplan-Meier survival for the cabgkul cohort with its logit-transformed confidence band

hzr_nelson() gives the Nelson-Aalen cumulative hazard (class hzr_nelson): time, n_risk, n_event, weight_sum, cumhaz with its lognormal limits cl_lower/cl_upper, hazard, and a running cum_events. The cumulative hazard is the same information as the survival curve turned the other way up; its slope at any point is the instantaneous hazard, which is what the parametric fit below will try to summarise.

na <- hzr_nelson(cabgkul$int_dead, cabgkul$dead)
head(na)

Nelson cumulative hazard estimate with lognormal CL
Events: 69  | Time points: 6 

    time n_risk n_event weight_sum cumhaz std_err cl_lower cl_upper hazard
 0.03285   5880      39         39 0.0066   2e-04   0.0063   0.0070 0.2019
 0.06571   5841       9          9 0.0082   2e-04   0.0077   0.0087 0.0469
 0.09856   5832       3          3 0.0087   3e-04   0.0081   0.0093 0.0157
 0.13142   5829       7          7 0.0099   3e-04   0.0092   0.0106 0.0365
 0.16427   5822       9          9 0.0114   4e-04   0.0107   0.0122 0.0471
 0.19713   5813       2          2 0.0118   4e-04   0.0110   0.0126 0.0105
 cum_events
         39
         48
         51
         58
         67
         69

ggplot(na, aes(x = time, y = cumhaz)) +
  geom_ribbon(aes(ymin = cl_lower, ymax = cl_upper), alpha = 0.2, fill = "firebrick") +
  geom_step(colour = "firebrick", linewidth = 0.7) +
  labs(x = "Years since operation", y = "Cumulative hazard",
       title = "Nelson-Aalen cumulative hazard (cabgkul)") +
  theme_hv_manuscript()

Figure 11.2: Empirical Nelson-Aalen cumulative hazard for the cabgkul cohort with lognormal confidence limits

11.3.2 The parametric fit

hazard() fits the parametric model when you pass fit = TRUE. The one argument you must not skip is theta, the starting values for the parameter vector. The optimiser needs somewhere to begin, and there is no safe default it can guess for you, so leaving theta out makes the fit (and every later predict()) error. For a Weibull the two parameters are the scale mu and the shape nu; theta = c(0.01, 1.0) is a reasonable start for both. The returned object (class hazard) is a list with call, spec, data, fit, legacy_args, and engine; the fitted parameters live in fit$fit$theta.

fit <- hazard(time = cabgkul$int_dead, status = cabgkul$dead,
              theta = c(0.01, 1.0), dist = "weibull", fit = TRUE)
summary(fit)

hazard model summary
  observations: 5880 
  predictors:   0 
  dist:         weibull 
  engine:       native-r-m2 
  converged:    TRUE 
  log-lik:      -3935.72 
  evaluations: fn=37, gr=9

Coefficients:
       estimate    std_error    z_stat       p_value
mu 0.0003613699 6.016407e-05  6.006407  1.896792e-09
nu 0.5944429760 2.346410e-02 25.334149 1.343733e-141

predict.hazard() evaluates the fitted model over any time grid you hand it in newdata. For type = "survival" and type = "cumulative_hazard" it returns a plain numeric vector lined up with the time column. The instantaneous hazard takes one extra step. This model is intercept-only, so type = "hazard" returns the constant relative hazard exp(eta) = 1 rather than the curve we want; the honest way to recover the instantaneous rate is to difference the cumulative hazard, since the hazard is by definition its slope.

t_grid <- seq(0.5, max(cabgkul$int_dead), length.out = 120)
nd <- data.frame(time = t_grid)

pred <- data.frame(
  time     = t_grid,
  survival = predict(fit, newdata = nd, type = "survival"),
  cumhaz   = predict(fit, newdata = nd, type = "cumulative_hazard")
)
# Instantaneous hazard = d(cumhaz)/dt
pred$hazard <- c(NA, diff(pred$cumhaz) / diff(pred$time))
str(pred)

'data.frame':   120 obs. of  4 variables:
 $ time    : num  0.5 2.19 3.88 5.58 7.27 ...
 $ survival: num  0.994 0.986 0.98 0.975 0.971 ...
 $ cumhaz  : num  0.00596 0.01434 0.02015 0.02498 0.02924 ...
 $ hazard  : num  NA 0.00495 0.00343 0.00286 0.00252 ...

The fitted model gives us a smooth survival curve, its cumulative hazard, and the instantaneous hazard rate, one panel each.

ggplot(pred, aes(x = time, y = survival)) +
  geom_line(colour = "steelblue", linewidth = 0.8) +
  labs(x = "Years since operation", y = "Survival probability",
       title = "Parametric (Weibull) survival") +
  theme_hv_manuscript()

Figure 11.3: Parametric Weibull survival curve from the fitted model

ggplot(pred, aes(x = time, y = cumhaz)) +
  geom_line(colour = "firebrick", linewidth = 0.8) +
  labs(x = "Years since operation", y = "Cumulative hazard",
       title = "Parametric (Weibull) cumulative hazard") +
  theme_hv_manuscript()

Figure 11.4: Parametric Weibull cumulative hazard from the fitted model

ggplot(subset(pred, !is.na(hazard)), aes(x = time, y = hazard)) +
  geom_line(colour = "darkgreen", linewidth = 0.8) +
  labs(x = "Years since operation", y = "Instantaneous hazard rate",
       title = "Parametric (Weibull) hazard rate") +
  theme_hv_manuscript()

Figure 11.5: Instantaneous hazard rate derived by differencing the parametric Weibull cumulative hazard

11.3.3 Observed vs parametric

A parametric fit is only worth showing once you have checked it against the empirical curve it is meant to summarise. So overlay them: the observed Kaplan-Meier (or Nelson-Aalen) estimate is the yardstick, the parametric Weibull is the smooth model laid over it. Two series on one panel means a legend, and we place it inside the panel in the empty corner so it names the curves without covering them.

p_th_surv <- ggplot() +
  geom_ribbon(data = km,
              aes(x = time, ymin = cl_lower, ymax = cl_upper),
              alpha = 0.2, fill = "grey40") +
  geom_step(data = km,
            aes(x = time, y = survival, colour = "Observed (Kaplan-Meier)"),
            linewidth = 0.7) +
  geom_line(data = pred,
            aes(x = time, y = survival, colour = "Parametric (Weibull)"),
            linewidth = 0.9) +
  scale_colour_manual(
    name   = NULL,
    values = c("Observed (Kaplan-Meier)" = "grey40",
               "Parametric (Weibull)"    = "steelblue")
  ) +
  labs(x = "Years since operation", y = "Survival probability") +
  theme_hv_manuscript()

# Survival falls left-to-right, leaving the lower-left empty; pin the key there
# with prefer = "bottomleft" (it still auto-dodges if that corner fills).
hv_legend_inside(p_th_surv, prefer = "bottomleft") +
  theme(legend.background = element_rect(fill = "white", colour = "grey80",
                                         linewidth = 0.3))

Figure 11.6: Observed Kaplan-Meier survival overlaid with the parametric Weibull curve as a fit check

Read the overlay as a fit check: where the smooth Weibull tracks the step function, the parametric form is describing the data; where the observed Kaplan-Meier curve and its grey confidence band pull away from the Weibull line, the single Weibull shape is the wrong summary and a multi-phase fit is the next move. (The band is the observed estimate’s; this parametric predict() returns a point curve with no interval.) The same overlay works for cumulative hazard, with the Nelson-Aalen estimate as the observed series.

p_th_cumhaz <- ggplot() +
  geom_step(data = na,
            aes(x = time, y = cumhaz, colour = "Observed (Nelson-Aalen)"),
            linewidth = 0.7) +
  geom_line(data = pred,
            aes(x = time, y = cumhaz, colour = "Parametric (Weibull)"),
            linewidth = 0.9) +
  scale_colour_manual(
    name   = NULL,
    values = c("Observed (Nelson-Aalen)" = "grey40",
               "Parametric (Weibull)"    = "firebrick")
  ) +
  labs(x = "Years since operation", y = "Cumulative hazard") +
  theme_hv_manuscript()

# Cumulative hazard rises left-to-right, leaving the upper-left empty.
hv_legend_inside(p_th_cumhaz) +
  theme(legend.background = element_rect(fill = "white", colour = "grey80",
                                         linewidth = 0.3))

Figure 11.7: Observed Nelson-Aalen cumulative hazard overlaid with the parametric Weibull curve

11.3.4 Numbers at risk

A survival figure should carry its denominators. The empirical curve here is a plain data frame, with no $tables$risk to draw on, so we hand hv_atrisk() the subject-level data directly — the time and status columns — and compose the table under the curve, aligned to its ticks.

curve_km <- ggplot(km, aes(x = time, y = survival)) +
  geom_ribbon(aes(ymin = cl_lower, ymax = cl_upper),
              alpha = 0.2, fill = "steelblue") +
  geom_step(colour = "steelblue", linewidth = 0.7) +
  labs(x = "Years since operation", y = "Survival probability") +
  theme_hv_manuscript()

hv_atrisk_compose(
  curve_km,
  hv_atrisk(cabgkul, time = "int_dead", status = "dead")
)

Figure 11.8: Kaplan-Meier survival with a numbers-at-risk table rebuilt from the subject-level data and composed beneath

For an hv_survival() object the at-risk table comes for free from $tables$risk (see the survival chapter); for a curve built from raw columns like this one, the subject-level path rebuilds it.

11.4 Read it

A hazard curve is read by its shape, phase by phase. The Weibull above is the simplest case: its shape parameter nu < 1 produces a hazard that starts high and falls monotonically, which is exactly the early-risk story of post-operative mortality. Risk is greatest in the first months and then eases.

In the fuller multi-phase model the curve has three readable regions. An early phase is the steep descent right after the operation, the perioperative risk working itself out. A constant phase is the long flat middle, the background rate of a stable cohort; a horizontal hazard there means risk is no longer related to how long ago the surgery was. A late phase is an upturn years out, typically the cohort aging or a prosthesis wearing. When you describe a figure like this to a reviewer, name the phases: “an early peak that resolves by year one, a flat interval, and a late rise after year ten.” That is the language the SAS-HAZARD lineage trained everyone to expect. Read the cumulative hazard differently: it only ever rises, and it is its slope, not its height, that carries the hazard. A flattening cumulative-hazard curve means low current risk even though the total keeps climbing.

11.5 Variations

11.5.1 Competing risks

Death is not the only thing that can happen, and once a patient dies they can no longer have a thromboembolic event. Treating one outcome as plain censoring for the other overstates incidence; the cumulative incidence function is the honest summary. hzr_competing_risks() takes an integer-coded event vector (0 = censored, 1, 2, ... = competing event types) and returns a data frame (class hzr_competing_risks) with time, n_risk, per-type n_event_k, n_censor, the event-free surv, per-type cumulative incidence incid_k, and standard errors.

The omc dataset records thromboembolic events (te1, te2, te3) and death (dead). We collapse them into a two-cause outcome: type 1 is any thromboembolic event, type 2 is death.

data(omc, package = "TemporalHazard")
event <- with(omc, ifelse(te1 == 1 | te2 == 1 | te3 == 1, 1L,
                          ifelse(dead == 1, 2L, 0L)))
table(event)

event
  0   1   2 
257  33  49

cr <- hzr_competing_risks(omc$int_dead, event)
head(cr)

Competing risks cumulative incidence
Event types: 2  | Time points: 6 
Final survival: 0.982 
Final incid_1 : 0.0059 
Final incid_2 : 0.012 

     time n_risk n_event_1 n_event_2 n_censor   surv incid_1 incid_2 se_surv
  0.06571    339         1         0        0 0.9971  0.0029   0.000  0.0029
  6.93238    334         0         1        0 0.9941  0.0029   0.003  0.0042
  8.37800    333         0         1        0 0.9911  0.0029   0.006  0.0051
  8.87082    331         1         0        0 0.9881  0.0059   0.006  0.0059
 16.85456    326         0         1        0 0.9851  0.0059   0.009  0.0066
 16.98598    325         0         1        0 0.9820  0.0059   0.012  0.0073
   se_1   se_2
 0.0029 0.0000
 0.0029 0.0030
 0.0029 0.0042
 0.0042 0.0042
 0.0042 0.0052
 0.0042 0.0060

Reshape the two incidence columns to long form and plot both curves together. Read them as a pair: the two cumulative incidence curves climb toward a shared ceiling, and the gap between them is the relative burden of each cause.

cif <- data.frame(
  time      = rep(cr$time, 2),
  incidence = c(cr$incid_1, cr$incid_2),
  cause     = factor(rep(c("Thromboembolic", "Death"), each = nrow(cr)))
)

ggplot(cif, aes(x = time, y = incidence, colour = cause)) +
  geom_step(linewidth = 0.7) +
  labs(x = "Years since operation", y = "Cumulative incidence",
       colour = "Cause", title = "Competing-risks cumulative incidence (omc)") +
  theme_hv_manuscript()

Figure 11.9: Competing-risks cumulative incidence for thromboembolic events and death in the omc cohort

11.5.2 Diagnostics

A parametric fit is only worth reporting if it tracks the data. hzr_gof() compares the fitted model against the Kaplan-Meier estimate. Call it on the fit with no extra arguments: it populates the observed and expected event counts on its own default grid (the observed event times), and handing it a custom grid leaves those columns empty. The returned columns are time, n_risk, n_event, n_censor, km_surv, km_cumhaz, par_surv, par_cumhaz, cum_observed, and cum_expected, plus a residual (expected minus observed); a "summary" attribute carries the scalar conservation diagnostics.

gof <- hzr_gof(fit)
attr(gof, "summary")

$total_observed
[1] 545

$total_expected
[1] 544.97

$final_residual
[1] -0.03001163

$dist
[1] "weibull"

$n
[1] 5880

head(gof)

Goodness-of-fit: observed vs. expected events
Distribution: weibull  | n = 5880 

Total observed events: 545 
Total expected events: 544.97 
Final residual (E - O): -0.03 
Conservation ratio (E/O): 1 

Use plot columns: time, km_surv, par_surv, cum_observed, cum_expected, residual

Overlaying the parametric survival on the Kaplan-Meier survival is the quickest visual check. Where the two curves sit on top of each other the Weibull is faithful; where they peel apart it is missing structure in the data.

gof_long <- data.frame(
  time     = rep(gof$time, 2),
  survival = c(gof$km_surv, gof$par_surv),
  model    = factor(rep(c("Kaplan-Meier", "Parametric"), each = nrow(gof)))
)

ggplot(gof_long, aes(x = time, y = survival, colour = model)) +
  geom_line(linewidth = 0.6) +
  labs(x = "Years since operation", y = "Survival probability",
       colour = NULL, title = "Goodness of fit: observed vs. parametric") +
  theme_hv_manuscript()

Figure 11.10: Goodness-of-fit overlay of parametric and Kaplan-Meier survival on the same panel

The cumulative observed-versus-expected curve is the survival analogue of a calibration plot. When the two lines stay together (a conservation ratio near 1) the model is reproducing the actual event count over time, not just the shape of the curve.

oe <- data.frame(
  time   = rep(gof$time, 2),
  events = c(gof$cum_observed, gof$cum_expected),
  kind   = factor(rep(c("Observed", "Expected"), each = nrow(gof)))
)

ggplot(oe, aes(x = time, y = events, colour = kind)) +
  geom_line(linewidth = 0.6) +
  labs(x = "Years since operation", y = "Cumulative events",
       colour = NULL, title = "Cumulative observed vs. expected events") +
  theme_hv_manuscript()

Figure 11.11: Cumulative observed versus expected events, the survival analogue of a calibration plot

Finally, hzr_calibrate() bins a continuous covariate into quantile groups and reports the observed event probability per bin, returning group, n, events, mean, min, max, prob, and link_value (the event probability on the chosen link scale). Here we calibrate against follow-up time itself as a worked example; against a real risk score the same plot tells you whether predicted and observed risk agree across the range.

cal <- hzr_calibrate(cabgkul$int_dead, cabgkul$dead, groups = 10)
cal

Variable calibration (logit link, 10 groups)

 group   n events    mean     min     max  prob link_value
     1 588    178   5.459   0.033  10.251 0.303     -0.834
     2 588     33  14.284  10.251  18.629 0.056     -2.822
     3 588     35  23.803  18.694  28.682 0.060     -2.760
     4 588     37  34.431  28.682  40.083 0.063     -2.701
     5 588     49  46.693  40.182  52.404 0.083     -2.398
     6 588     29  57.231  52.404  61.504 0.049     -2.959
     7 588     37  67.096  61.537  73.726 0.063     -2.701
     8 588     44  82.006  73.726  91.107 0.075     -2.515
     9 588     37 101.465  91.107 114.171 0.063     -2.701
    10 588     66 136.711 114.171 201.828 0.112     -2.068

ggplot(cal, aes(x = mean, y = prob)) +
  geom_point(size = 2, colour = "steelblue") +
  geom_line(colour = "steelblue") +
  labs(x = "Mean follow-up time in bin (years)", y = "Observed event probability",
       title = "Calibration across follow-up-time deciles") +
  theme_hv_manuscript()

Figure 11.12: Observed event probability across follow-up-time deciles from hzr_calibrate()

11.6 Pitfalls

hazard() needs starting values. Pass theta explicitly, for example theta = c(0.01, 1). Without it the optimiser has no starting point and the fit (and every predict() that follows) errors out. This is the single most common reason a TemporalHazard script fails on first run.
The instantaneous hazard is not free. For an intercept-only model predict(type = "hazard") returns the constant relative hazard, not the curve you want. Derive the instantaneous rate by differencing the cumulative hazard, as in the predict chunk above; the hazard is its slope.
Let hzr_gof() choose its own grid. It fills the observed and expected event counts only on its default event-time grid. Hand it a custom grid and cum_observed/cum_expected come back empty, and the calibration plots have nothing to draw. Call hzr_gof(fit) with no grid argument.
A pretty fit is not a faithful one. Always run the goodness-of-fit overlay before you quote a parametric hazard. If the Weibull curve drifts off the Kaplan-Meier line, the smooth shape is hiding a misfit, and a richer multi-phase specification is the right next step.