Split varpro partial dependence data into continuous or categorical datasets
Source:R/gg_partialpro.R
gg_partialpro.RdTakes the list returned by varpro::partialpro and separates variables
into two data frames: one for continuous predictors (parametric, non-
parametric, and causal effect curves) and one for categorical predictors
(one row per observation per category level).
Arguments
- part_dta
partial plot data from
varpro::partialpro. Each element of the list must contain fieldsxvirtual,xorg,yhat.par,yhat.nonpar, andyhat.causal.- nvars
how many variables (list elements) to process. Defaults to all variables in
part_dta.- cat_limit
Variables with
length(xvirtual)\(\le\)cat_limitare treated as categorical. Default10.- model
a label applied to all rows. Useful when combining results from multiple models in a single figure.
Value
A named list with two elements:
- continuous
data.frame with columns
variable,parametric,nonparametric,causal,name(and optionallymodel)- categorical
data.frame with the same columns but one row per observation per category level
Details
The split is governed by cat_limit: a variable is treated as
continuous when length(xvirtual) > cat_limit; otherwise it is
treated as categorical and the per-category rows are stacked.
Examples
## Construct mock varpro partialpro output:
## - "age": a continuous predictor (xvirtual has > 10 points)
## - "sex": a categorical predictor (xvirtual has 2 points)
set.seed(42)
n_obs <- 30 # number of observations (rows in yhat matrices)
n_pts <- 15 # number of evaluation points for continuous variables
mock_data <- list(
age = list(
# xvirtual: evaluation grid for the marginal effect curve
xvirtual = seq(30, 80, length.out = n_pts),
# xorg: original observed values (used only for categorical detection)
xorg = sample(seq(30, 80, by = 5), n_obs, replace = TRUE),
# yhat matrices: n_obs rows x n_pts columns (predictions at each grid pt)
yhat.par = matrix(rnorm(n_obs * n_pts), nrow = n_obs),
yhat.nonpar = matrix(rnorm(n_obs * n_pts), nrow = n_obs),
yhat.causal = matrix(rnorm(n_obs * n_pts), nrow = n_obs)
),
sex = list(
# Two categories: the xvirtual grid has only 2 points
xvirtual = c(0, 1),
xorg = sample(c(0, 1), n_obs, replace = TRUE),
# Two-column yhat matrices (one column per category)
yhat.par = matrix(rnorm(n_obs * 2), nrow = n_obs),
yhat.nonpar = matrix(rnorm(n_obs * 2), nrow = n_obs),
yhat.causal = matrix(rnorm(n_obs * 2), nrow = n_obs)
)
)
result <- gg_partialpro(mock_data)
## Continuous result: one row per evaluation grid point
head(result$continuous)
#> # A tibble: 6 × 5
#> variable parametric nonparametric causal name
#> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 30 -0.0695 -0.120 0.436 age
#> 2 33.6 0.149 -0.477 0.0438 age
#> 3 37.1 -0.237 -0.270 0.0531 age
#> 4 40.7 -0.0346 -0.0673 -0.0482 age
#> 5 44.3 -0.136 0.171 0.176 age
#> 6 47.9 0.0217 0.0159 0.119 age
## Categorical result: n_obs rows per category level
head(result$categorical)
#> # A tibble: 6 × 5
#> parametric nonparametric causal variable name
#> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 0.850 -0.0338 0.751 0 sex
#> 2 1.76 -0.901 -0.829 0 sex
#> 3 0.846 -1.18 0.710 0 sex
#> 4 -0.545 -0.448 0.950 0 sex
#> 5 0.255 0.559 -0.579 0 sex
#> 6 0.299 0.0931 -1.23 0 sex