Split partial dependence data into continuous or categorical datasets
Source:R/gg_partial.R
gg_partial.RdA partial dependence curve answers a what-if question about a forest: hold every other predictor at its observed value, sweep one of them across its range, and watch how the ensemble prediction moves. Marginalized over the joint distribution of the other variables, the resulting curve isolates the average effect of the swept predictor alone.
Arguments
- part_dta
partial plot data from
rfsrc::plot.variable- nvars
how many of the partial plot variables to calculate
- cat_limit
Categorical features are built when there are fewer than
cat_limitunique feature values.- model
a label name applied to all features. Useful when combining multiple partial plot objects in figures.
Value
A named list with two elements:
- continuous
data.frame with columns
x,yhat,name(and optionallymodel) for continuous variables- categorical
data.frame with the same columns but with
xas a factor, for low-cardinality / categorical variables
Details
gg_partial handles the bookkeeping step after you've already called
rfsrc::plot.variable(partial = TRUE): it takes the list that function
returns and separates the variables into two tidy data frames – one for
continuous predictors (plotted as lines) and one for categorical predictors
(plotted as bar charts). The split is controlled by cat_limit:
variables with more unique x-values than this threshold are treated as
continuous; all others are categorical.
If you'd rather skip the plot.variable step and pass the fitted
forest directly, see gg_partial_rfsrc, which calls
partial.rfsrc for you.
Note
Partial-dependence extraction is randomForestSRC-only;
there is no randomForest method (the randomForest package
provides no comparable partial-dependence interface).
Examples
## Build a small regression forest on the airquality dataset
set.seed(42)
airq <- na.omit(airquality)
rf <- randomForestSRC::rfsrc(Ozone ~ ., data = airq, ntree = 50)
## Compute partial dependence via plot.variable (show.plots = FALSE to
## suppress the base-graphics output, we only want the data)
pv <- randomForestSRC::plot.variable(rf, partial = TRUE,
show.plots = FALSE)
## Split into continuous and categorical data frames
result <- gg_partial(pv)
head(result$continuous)
#> # A tibble: 6 × 3
#> x yhat name
#> <dbl> <dbl> <chr>
#> 1 7 30.3 Solar.R
#> 2 14 30.4 Solar.R
#> 3 25 30.8 Solar.R
#> 4 44 31.3 Solar.R
#> 5 65 32.7 Solar.R
#> 6 81 36.0 Solar.R
## Label this model for later comparison with a second forest
result_labelled <- gg_partial(pv, model = "airq_model")
unique(result_labelled$continuous$model)
#> [1] "airq_model"