bolasso()
gains a fast
argument which
optimizes computation by using a single cross-validated regression on
the entire dataset to determine the optimal regularization parameter
(lambda
). This approach bypasses the need for
cross-validation within each bootstrap replicate, drastically reducing
computation time, especially beneficial for large datasets or when using
a high number of bootstrap replicates.
# Fast mode reduces computation time by using a single cross-validated lambda
<- bolasso(
model_fast ~ .,
diabetes data = train,
n.boot = 1000,
progress = FALSE,
family = "binomial",
fast = TRUE
)
selected_vars()
is now a shorthand for
selected_variables()
.
selected_variables()
/selected_vars()
supports two variable selection algorithms via the method
argument: the Variable Inclusion Probability (VIP) method and the
Quantile (QNT) method. The VIP method selects variables that appear in a
high percentage of bootstrap models, while the QNT method selects
variables based on bootstrap confidence intervals. Set
method = "vip"
or method = "qnt"
,
respectively.
# Select variables using the VIP method with a 95% threshold
<- selected_variables(model, threshold = 0.95, method = "vip")
selected_vars_vip
# Select variables using the QNT method
<- selected_variables(model, threshold = 0.95, method = "qnt") selected_vars_qnt
tidy()
extracts a tidy tibble summarizing
bootstrap-level coefficients for each covariate. This method provides a
clean and organized way to inspect model coefficients.
# Extract a tidy tibble of coefficients
<- tidy(model, select = "lambda.min") tidy_coefs
plot_selection_thresholds()
provides a visual
representation of the selection thresholds for each variable. This
visualization helps users understand the stability and robustness of
variable selection across different thresholds and methods.
# Visualize selection thresholds for variables
plot_selection_thresholds(model, select = "lambda.min")
plot_selected_variables()
visualizes the coefficient
distributions for only the selected variables. This function provides a
focused view of the most relevant variables in the model.
# Plot coefficient distributions for selected variables
plot_selected_variables(
model,threshold = 0.95,
method = "vip",
select = "lambda.min"
)
plot
visualizes the coefficient distributions for
all model covariates.
# Plot coefficient distributions for selected variables
plot(model, select = "lambda.min")
NEWS.md
file to track changes to the
package.bolasso()
argument form
has been renamed
to formula
to reflect common naming conventions in R
statistical modeling packages.predict()
and coef()
methods are now
implemented using future.apply::future_lapply
allowing for
computing predictions and extracting coefficients in parallel. This may
result in slightly worse performance (due to memory overhead) when the
model/prediction data is small but will be significantly faster when
e.g. generating predictions on a very large data-set.bolasso()
argument
formula
. The user-supplied value of formula
is
handled via deparse()
which has a default
width.cutoff
value of 60. This was causing issues with
formulas by splitting them into multi-element character vectors. It has
now been set to the maximum value of 500L
which will
correctly parse all lengths of formulas.predict()
now forces evaluation of the
formula
argument in the bolasso()
call. This
resolves an issue where, if a user passes a formula via a variable,
predict()
would pass the variable name to the underlying
prediction function as opposed to the actual formula.