final_point_estimate = "average"
S3
class,
which makes internal code cleaner and facilitates simpler addition of
new predictiveness measures.extract_sampled_split_predictions
is a vector, not a list.
This facilitates proper use in the new version of the package.truncate = FALSE
in
vimp_ci
measure_avg_value
(computes the average value and efficient
influence function) and updates to vim
,
cv_vim
, and sp_vim
.method
and family
for weighted EIF
estimation within outer functions (vim
,
cv_vim
, sp_vim
) rather than the
measure*
functions. This allows compatibility for binary
outcomes.sp_vim
that are necessary to compute
the test statisticsparallel
argument to be specified for calls to
CV.SuperLearner
but not for calls to
SuperLearner
Z
in coarsened-data
settings; allow case-insensitive specification of covariate
names/positions when creating Z
V
defaults to 5 if no cross-fitting folds are specified
externallycross_fitted_f1
and
cross_fitted_f2
in cv_vim
cross_fitted_f1
and
cross_fitted_f2
in cv_vim
cv_vim
handles an odd number of outer folds
being passed with pre-computed regression function estimates. Now, you
can use an odd number of folds (e.g., 5) to estimate the full and
reduced regression functions and still obtain cross-validated variable
importance estimates.vrc01
data as an exported objectvrc01
dataC
to not be specified in
make_folds
None
measure_auc
to hew more
closely to ROCR
and cvAUC
, using computational
tricks to speed up weighted AUC and EIF computation.cross_fitted_se
to cv_vim
and sp_vim
; this logical option allows the standard error
to be estimated using cross-fitting. This can improve performance in
cases where flexible algorithms are used to estimate the full and
reduced regressions.vim
and cv_vim
; currently, this option is only
available for non-sampled-split calls (i.e., with
sample_splitting = FALSE
)vim
are based on the entire dataset, while the
full and reduced predictiveness (predictiveness_full
and
predictiveness_reduced
, along with the corresponding
confidence intervals) is evaluated using separate portions of the data
for the full and reduced regressions.sample_splitting
to vim
,
cv_vim
and sp_vim
; if FALSE
,
sample-splitting is not used to estimate predictiveness. Note that we
recommend using the default, TRUE
, in all cases, since
inference using sample_splitting = FALSE
will be invalid
for variables with truly null variable importance.sample_splitting = TRUE
to match more
closely with theoretical results (and improve power!). In this case, we
first split the data into \(2K\)
cross-fitting folds, and split these folds equally into two
sample-splitting folds. For the nuisance regression using all
covariates, for each \(k \in \{1, \ldots,
K\}\) we set aside the data in sample-splitting fold 1 and
cross-fitting fold \(k\) [this
comprises \(1 / (2K)\) of the data]. We
train using the remaining observations [comprising \((2K-1)/(2K)\) of the data] not in this
testing fold, and we test on the originally withheld data. We repeat for
the nuisance regression using the reduced set of covariates, but
withhold data in sample-splitting fold 2. This update affects both
cv_vim
and sp_vim
. If
sample_splitting = FALSE
, then we use standard
cross-fitting.>=
in computing the numerator of AUC with
inverse probability weightsroxygen2
documentation for wrappers
(vimp_*
) to inherit parameters and details from
cv_vim
(reduces potential for documentation
mismatches)None
family
if it isn’t
specified; use stats::binomial()
if there are only two
unique outcome values, otherwise use stats::gaussian()
None
cvAUC
)cvAUC
None
ipc_est_type
(available in vim
,
cv_vim
, and sp_vim
; also corresponding wrapper
functions for each VIM and corresponding internal estimation
functions)None
None
testthat/
to use glm
rather than xgboost
(increases speed)glm
rather than
xgboost
or ranger
(increases speed, even
though the regression is now misspecified for the truth)forcats
from vignetteNone
measure_accuracy
and
measure_auc
for project-wide consistencytestthat/
to not explicitly load
xgboost
None
None
stats::qlogis
and
stats::plogis
rather than bespoke functionsNone
None
vimp
will handle the rest.vimp
”run_regression = TRUE
for simplicityverbose
to sp_vim
; if
TRUE
, messages are printed throughout fitting that display
progress and verbose
is passed to
SuperLearner
cv_predictiveness_point_est
and
predictiveness_point_est
to
est_predictiveness_cv
and est_predictiveness
,
respectivelycv_predictiveness_update
,
cv_vimp_point_est
, cv_vimp_update
,
predictiveness_update
, vimp_point_est
,
vimp_update
; this functionality is now in
est_predictiveness_cv
and est_predictiveness
(for the *update*
functions) or directly in
vim
or cv_vim
(for the *vimp*
functions)predictiveness_se
and
predictiveness_ci
(functionality is now in
vimp_se
and vimp_ci
, respectively)weights
argument to ipc_weights
,
clarifying that these weights are meant to be used as inverse
probability of coarsening (e.g., censoring) weightsAdded functions sp_vim
, sample_subsets
,
spvim_ics
, spvim_se
; these allow computation
of Shapely Population Variable Importance (SPVIM)
None
sp_vim
and helper functions
run_sl
, sample_subsets
,
spvim_ics
, spvim_se
; these will be added in a
future releasecv_vim_nodonsker
, since
cv_vim
supersedes this functionsp_vim
and helper functions
run_sl
, sample_subsets
,
spvim_ics
, spvim_se
; these functions allow
computation of the Shapley Population Variable Importance Measure
(SPVIM)cv_vim
and vim
now use an outer layer
of sample splitting for hypothesis testingvimp_auc
,
vimp_accuracy
, vimp_deviance
,
vimp_rsquared
vimp_regression
is now deprecated; use
vimp_anova
insteadvim
; each variable importance
function is now a wrapper function around vim
with the
type
argument filled incv_vim_nodonsker
is now deprecated; use
cv_vim
insteadvimp_anova
)vimp_anova
)None
gam
package update by switching
library to SL.xgboost
, SL.step
, and
SL.mean
None
gam
package update in unit testsNone
cv_vim
andcv_vim_nodonsker
now return the
cross-validation folds used within the functionNone
family
for the top-level
SuperLearner if run_regression = TRUE
; in call cases, the
second-stage SuperLearner uses a gaussian
familySL.mean
as the best-fitting
algorithm, the second-stage regression is now run using the original
outcome, rather than the first-stage fitted valuescv_vim_nodonsker
, which computes the
cross-validated naive estimator and the update on the same, single,
validation fold. This does not allow for relaxation of the Donsker class
conditions.None
two_validation_set_cv
, which sets up
folds for V-fold cross-validation with two validation sets per foldcv_vim
: now, the
cross-validated naive estimator is computed on a first validation set,
while the update for the corrected estimator is computed using the
second validation set (both created from
two_validation_set_cv
); this allows for relaxation of the
Donsker class conditions necessary for asymptotic convergence of the
corrected estimator, while making sure that the initial CV naive
estimator is not biased high (due to a higher R^2 on the training
data)None
None
cv_vim
: now, the
cross-validated naive estimator is computed on the training data for
each fold, while the update for the corrected cross-validated estimator
is computed using the test data; this allows for relaxation of the
Donsker class conditions necessary for asymptotic convergence of the
corrected estimatorvim
, replaced with
individual-parameter functionsvimp_regression
to match Python
packagecv_vim
now can compute regression estimatorsvimp_ci
,
vimp_se
, vimp_update
,
onestep_based_estimator
None
Bugfixes etc.