Random forests & model visualization

A Kaplan-Meier curve answers one question at a time, and a Cox model asks you to commit up front to which covariates enter, in what functional form, and which ones interact (both live in the time-to-event part). Outcomes data after cardiac surgery rarely behave that cleanly. The effect of age on reoperation risk is not a straight line; the effect of an implant type depends on the patient it went into; some predictors are continuous (gradient, ejection fraction) and some are categorical (valve position, urgency), and the interactions among them are exactly the part you did not anticipate.

Reach for a random forest when you want a model to find that structure for you. A forest grows many decision trees on bootstrap resamples of the cohort and averages their predictions. Because each tree splits the data on whatever variable helps most at each node, the ensemble picks up nonlinear effects and interactions without you specifying them, and it handles continuous and categorical predictors side by side. Every observation is left out of roughly a third of the trees, so the forest can predict each patient using only the trees that never saw them – an OOB (out-of-bag) estimate that behaves like cross-validation and costs nothing extra to compute.

This whole part treats a forest analysis as two steps, always in the same order. First, fit a forest with randomForestSRC::rfsrc() (Ishwaran and Kogalur 2026): it grows the trees, computes the OOB predictions, and (optionally) variable importance. Second, visualize the fitted object with a ggRandomForests (Ehrlinger 2026) gg_*() constructor. Each constructor extracts a tidy slice of the forest – error rates, predicted survival, variable importance, partial dependence – and returns a lightweight object with plot() and autoplot() methods that produce ggplot graphics. Because every gg_*() plot is a standard ggplot, the house style applies exactly as everywhere else in the book: add + theme_hv_manuscript().

The chapters that follow take one fitted forest and read a different slice of it; the constructor is the only thing that changes.

Chapter Constructor Shows
Error convergence gg_error() OOB error vs. number of trees
Predicted survival gg_rfsrc(), gg_survival() per-observation and grouped survival
Variable importance gg_vimp() VIMP ranking of predictors
Variable dependence gg_variable(), gg_partial_rfsrc() marginal and partial effects
ROC / Brier gg_roc(), gg_brier() classification performance
varPro varPro constructors rule-based variable selection

The forest does not have to be a survival forest. Give rfsrc() a numeric outcome and you get a regression forest; give it a factor and you get a classification forest. The gg_*() constructors adapt to the family of the fit, so the same pattern works whether the error is 1 - C-index, mean-squared error, or misclassification rate. The first chapter fits a forest end to end and checks that it converged before reading anything downstream.