ROC (Receiver Operating Characteristic) curve data from a classification forest.
Source:R/gg_roc.R
gg_roc.rfsrc.RdA classifier does not hand you a class; it hands you a predicted probability,
and you pick a threshold. Slide that threshold from 0 to 1 and the trade-off
between catching the positives and crying wolf shifts the whole way. The ROC
curve traces that trade-off. For one class of a classification
rfsrc or
randomForest forest, gg_roc walks every
threshold and records sensitivity (the true positive rate) against
specificity (1 minus the false positive rate).
Usage
# S3 method for class 'rfsrc'
gg_roc(object, which_outcome, oob = TRUE, per_class = FALSE, ...)Arguments
- object
A classification
rfsrcorrandomForestobject. Only forests withfamily == "class"(rfsrc) ortype == "classification"(randomForest) are supported.- which_outcome
Integer index or character name of the class to score. For binary forests this is usually
1or2; for multi-class forests, any valid class index or level name.which_outcome = "all"or0behaves differently by engine:randomForestmethodReturns a macro-averaged one-vs-rest ROC computed over the per-class probabilities.
rfsrcmethodWarns and falls back to class 1. The macro-average and per-class faceting for the
rfsrcpath are tracked separately under issue #72.
- oob
Logical; if
TRUE(default), build the curve from out-of-bag predicted probabilities, otherwise from full in-bag predictions. ForrandomForest,TRUEuses the out-of-bag vote probabilities inobject$votes;FALSEuses in-bagpredict(type = "prob").- per_class
Logical; if
TRUEand the forest has more than two classes, return one ROC curve per class, each class scored against all the others. The result is a long-formatdata.framewith aclassfactor column and a named AUC vector attribute, ordered by descending AUC. Binary forests treatper_class = TRUEas a no-op. Honoured by therandomForestmethod only.- ...
Extra arguments (currently unused).
Value
A gg_roc data.frame, one row per unique prediction
threshold, with columns:
- sens
Sensitivity (true positive rate) at the threshold.
- spec
Specificity (true negative rate) at the threshold.
- pct
The probability threshold used for that row.
Pass it to calc_auc for the area under the curve.
Examples
## ------------------------------------------------------------
## classification example
## ------------------------------------------------------------
## -------- iris data
rfsrc_iris <- randomForestSRC::rfsrc(Species ~ ., data = iris)
# ROC for setosa
gg_dta <- gg_roc(rfsrc_iris, which_outcome = 1)
plot(gg_dta)
# ROC for versicolor
gg_dta <- gg_roc(rfsrc_iris, which_outcome = 2)
plot(gg_dta)
# ROC for virginica
gg_dta <- gg_roc(rfsrc_iris, which_outcome = 3)
plot(gg_dta)
## -------- iris data
rf_iris <- randomForest::randomForest(Species ~ ., data = iris)
# ROC for setosa
gg_dta <- gg_roc(rf_iris, which_outcome = 1)
plot(gg_dta)
# ROC for versicolor
gg_dta <- gg_roc(rf_iris, which_outcome = 2)
plot(gg_dta)
# ROC for virginica
gg_dta <- gg_roc(rf_iris, which_outcome = 3)
plot(gg_dta)