Split varpro partial dependence data into continuous or categorical datasets

Takes the list returned by varpro::partialpro and separates variables into two data frames: one for continuous predictors (parametric, non- parametric, and causal effect curves) and one for categorical predictors (one row per observation per category level).

Usage

gg_partialpro(part_dta, nvars = NULL, cat_limit = 10, model = NULL)

Arguments

part_dta: partial plot data from varpro::partialpro. Each element of the list must contain fields xvirtual, xorg, yhat.par, yhat.nonpar, and yhat.causal.
nvars: how many variables (list elements) to process. Defaults to all variables in part_dta.
cat_limit: Variables with length(xvirtual) \(\le\) cat_limit are treated as categorical. Default 10.
model: a label applied to all rows. Useful when combining results from multiple models in a single figure.

Value

A named list with two elements:

continuous: data.frame with columns variable, parametric, nonparametric, causal, name (and optionally model)
categorical: data.frame with the same columns but one row per observation per category level

Details

The split is governed by cat_limit: a variable is treated as continuous when length(xvirtual) > cat_limit; otherwise it is treated as categorical and the per-category rows are stacked.

Examples

## Construct mock varpro partialpro output:
##   - "age": a continuous predictor (xvirtual has > 10 points)
##   - "sex": a categorical predictor (xvirtual has 2 points)
set.seed(42)
n_obs <- 30   # number of observations (rows in yhat matrices)
n_pts <- 15   # number of evaluation points for continuous variables

mock_data <- list(
  age = list(
    # xvirtual: evaluation grid for the marginal effect curve
    xvirtual   = seq(30, 80, length.out = n_pts),
    # xorg: original observed values (used only for categorical detection)
    xorg       = sample(seq(30, 80, by = 5), n_obs, replace = TRUE),
    # yhat matrices: n_obs rows x n_pts columns (predictions at each grid pt)
    yhat.par   = matrix(rnorm(n_obs * n_pts), nrow = n_obs),
    yhat.nonpar = matrix(rnorm(n_obs * n_pts), nrow = n_obs),
    yhat.causal = matrix(rnorm(n_obs * n_pts), nrow = n_obs)
  ),
  sex = list(
    # Two categories: the xvirtual grid has only 2 points
    xvirtual   = c(0, 1),
    xorg       = sample(c(0, 1), n_obs, replace = TRUE),
    # Two-column yhat matrices (one column per category)
    yhat.par   = matrix(rnorm(n_obs * 2), nrow = n_obs),
    yhat.nonpar = matrix(rnorm(n_obs * 2), nrow = n_obs),
    yhat.causal = matrix(rnorm(n_obs * 2), nrow = n_obs)
  )
)

result <- gg_partialpro(mock_data)

## Continuous result: one row per evaluation grid point
head(result$continuous)
#> # A tibble: 6 × 5
#>   variable parametric nonparametric  causal name 
#>      <dbl>      <dbl>         <dbl>   <dbl> <chr>
#> 1     30      -0.0695       -0.120   0.436  age  
#> 2     33.6     0.149        -0.477   0.0438 age  
#> 3     37.1    -0.237        -0.270   0.0531 age  
#> 4     40.7    -0.0346       -0.0673 -0.0482 age  
#> 5     44.3    -0.136         0.171   0.176  age  
#> 6     47.9     0.0217        0.0159  0.119  age  

## Categorical result: n_obs rows per category level
head(result$categorical)
#> # A tibble: 6 × 5
#>   parametric nonparametric causal variable name 
#>        <dbl>         <dbl>  <dbl>    <dbl> <chr>
#> 1      0.850       -0.0338  0.751        0 sex  
#> 2      1.76        -0.901  -0.829        0 sex  
#> 3      0.846       -1.18    0.710        0 sex  
#> 4     -0.545       -0.448   0.950        0 sex  
#> 5      0.255        0.559  -0.579        0 sex  
#> 6      0.299        0.0931 -1.23         0 sex