A uvarpro fit records how strongly each variable depends on the
others. This function pulls those cross-variable dependency scores from
the fit with get.beta.entropy and
sdependent, and returns them as a tidy list that
plot.gg_udependent can draw as a network.
Usage
gg_udependent(
object,
threshold = 0.25,
q.signal = 0.75,
directed = TRUE,
min.degree = NULL,
...
)Arguments
- object
A fitted
uvarproobject (required).- threshold
Numeric; the positive dependency threshold passed on to
sdependent(). An edge \(i \to j\) is drawn whenI[i, j] >= threshold. Default0.25.- q.signal
Quantile threshold (0–1) for picking out the signal variables; passed on to
sdependent(). Default0.75.- directed
Logical;
TRUE(default) builds a directed igraph.- min.degree
Integer or
NULL. When set, only nodes with degree \(\ge\)min.degreeare kept in$nodes,$edges, and$graph.- ...
Additional arguments forwarded to
varPro::sdependent().
Value
A named list of class "gg_udependent" with elements:
$edgesData frame:
variable_from,variable_to,weight(raw cross-importance value).$nodesData frame:
variable(factor, levels by descending degree),degree(integer; out-degree whendirected = TRUE, total degree whendirected = FALSE),selected(logical,TRUEif insdependent's signal set).$graphigraph object.
NULLif no dependencies detected.
A "provenance" attribute carries threshold, q.signal,
directed, min.degree, xvar.names, and n.
What cross-variable dependency is doing
UVarPro (Zhou, Lu and Ishwaran, 2026) extends the varpro framework to
the unsupervised setting: grow a forest without a response, then use
the same region-release contrasts varpro uses for supervised
importance to ask, "which variables explain the structure in the
data?" The lasso-driven variant frames each region-release contrast
as a classification task (does an observation belong to the region
or to its release?) and fits a lasso logistic regression with the
other variables as predictors. The coefficient on variable \(j\)
in the model for variable \(i\)'s region-release contrast is the
entry \(I[i, j]\) of the matrix varPro::get.beta.entropy()
returns.
Read that entry as "how much does knowing \(j\) help separate
\(i\)'s region from its release". A large \(I[i, j]\) says
\(j\) carries information about the structure varpro picked up in
\(i\). varPro::sdependent thresholds that matrix at a
user-chosen cut and returns the set of "signal" variables: the
nodes with high enough out-degree to be worth keeping. We pass the
threshold through to sdependent and use the same matrix to
weight the edges of the resulting graph.
The graph is directed by default because \(I[i, j]\) and
\(I[j, i]\) are separate lasso coefficients and need not agree;
setting directed = FALSE collapses each pair by taking the
larger of the two, which is appropriate when you only want to see
that two variables are dependent, not which way the dependency
reads.
What's in the output
$edges has one row per surviving edge with the raw weight
I[i, j] (or, for undirected graphs, the max of the two
directions). $nodes has one row per surviving variable with
its degree (out-degree for directed, total degree for undirected)
and a selected flag for membership in the sdependent
signal set. $graph is the same information packaged as an
igraph object, with weight, degree, and
selected attached so plot.gg_udependent can render it
without recomputing anything.
What you use this for
screen a wide unsupervised dataset for the small set of variables UVarPro thinks are carrying the signal: the nodes with high degree, or those flagged
selected = TRUE;spot clusters of mutually dependent variables (hubs and the spokes around them) that may be measuring the same underlying construct;
compare two datasets, or two preprocessing pipelines, by looking at how their dependency graphs change.
An edge in this graph is a statistical dependency in the unsupervised decomposition of the data. It is not a causal arrow. A high \(I[i, j]\) says \(j\) predicts \(i\)'s region membership, not that \(j\) causes \(i\).
References
Zhou, L., Lu, M. and Ishwaran, H. (2026). Variable priority for unsupervised variable selection. Pattern Recognition, 172:112727.