data(PimaIndiansDiabetes, package = "mlbench")
levels(PimaIndiansDiabetes$diabetes)[1] "neg" "pos"
rf <- rfsrc(diabetes ~ ., data = PimaIndiansDiabetes, ntree = 100)
rf$family[1] "class"
A classification forest does not just label each patient, it returns a probability for each class, and a probability is only useful if you can say how trustworthy it is. Reach for these plots whenever you need to report how well a classification forest performs: whether it can tell recurrence from non-recurrence, how it compares against another model, or whether the predicted probabilities are calibrated well enough to act on. They are the performance half of any classification result, the figures a reviewer looks for right after the model description.
We judge a probabilistic classifier the same way we judge any other:
Both come from ggRandomForests (Ehrlinger 2026), and a useful detail comes for free: these are computed out-of-bag (from patients each tree never saw during fitting), so they are honest estimates that need no separate test set.
ROC reads from a fitted classification forest, one whose outcome is a factor so the forest predicts class probabilities. We use the Pima Indians diabetes data from mlbench (Leisch and Dimitriadou 2026) — 768 patients with eight clinical measurements and a two-level diabetes outcome — a forest that discriminates well enough to draw a ROC curve worth reading.
data(PimaIndiansDiabetes, package = "mlbench")
levels(PimaIndiansDiabetes$diabetes)[1] "neg" "pos"
rf <- rfsrc(diabetes ~ ., data = PimaIndiansDiabetes, ntree = 100)
rf$family[1] "class"
The family field confirms a classification forest, and the levels are neg (no diabetes) and pos (diabetes). The clinically interesting event is a positive diagnosis, so that is the class we will treat as positive.
gg_roc() builds the out-of-bag ROC curve for one outcome. The which_outcome argument names the class to treat as positive, and the important thing is that it takes a numeric index, not the level name: 2 selects the second level, pos. Pass the string "pos" and the call errors with “subscript out of bounds”.
plot(gg_roc(rf, which_outcome = 2)) + theme_hv_manuscript()
The curve bows toward the top-left corner. The further it pulls away from the diagonal (the line a random classifier would trace), the better the forest separates diabetic from non-diabetic patients, and the AUC printed on the panel quantifies that gap.
With more than two classes you draw one curve per class, each class treated as positive in turn against all the others (one-vs-rest). The per_class = TRUE flag does this in a single call. We illustrate on iris, where the three species give three curves.
rf_iris <- rfsrc(Species ~ ., data = iris, ntree = 100)
plot(gg_roc(rf_iris, which_outcome = 1, per_class = TRUE)) +
theme_hv_manuscript()
Each coloured curve is one species-versus-rest classifier. The well-separated species hug the top-left corner; any class whose curve sags toward the diagonal is the one the forest struggles to tell apart from the others.
The Brier score is a calibration measure, but in ggRandomForests gg_brier() only supports right-censored survival forests. It errors on a classification forest, so we cannot run it on the Pima diabetes model above; instead we refit pbc as a survival forest and plot the integrated Brier score over follow-up time, the survival analogue of the calibration check we want for classification.
data(pbc, package = "randomForestSRC")
rf_surv <- rfsrc(Surv(days, status) ~ ., data = pbc, ntree = 100)
plot(gg_brier(rf_surv)) + theme_hv_manuscript()
Lower curves mean better-calibrated survival predictions across time. Read it alongside the error curve and AUC: discrimination (ROC and AUC) tells you whether the ranking is right, while the Brier score tells you whether the predicted probabilities themselves can be trusted.
which_outcome is a number, not a name. It is the index of the class to treat as positive. Use 2 to select the second level; passing the level name like "pos" throws “subscript out of bounds”. Check levels() to confirm which index is the class you mean.gg_brier() is survival-only. It works on right-censored survival forests and errors on a classification forest, which is why the Brier figure here is built on a pbc survival fit rather than the Pima diabetes classification fit used for ROC. Do not expect a Brier curve straight off your classifier.gg_brier() here targets survival. Picking the right tool for the forest family saves you a confusing error message.