Elastic Net SearcheR Examples

ensr package version 0.1.0

Peter E. DeWitt

2019-01-14

The primary purpose of the ensr package is to provide methods for simultaneously searching for preferable values of \(\lambda\) and \(\alpha\) in elastic net regression. ensr is wrapped around the r Rpkg("glmnet") package This vignette starts with a summary of elastic net regression and its use and limitations. Examples of data set preparation follow and the vignette concludes with elastic net regression results.

library(ensr)
## Loading required package: glmnet
## Loading required package: Matrix
## Loading required package: foreach
## Loaded glmnet 2.0-16
## 
## Attaching package: 'glmnet'
## The following object is masked from 'package:qwraps2':
## 
##     auc
library(data.table)
library(ggplot2)
## Registered S3 methods overwritten by 'ggplot2':
##   method         from 
##   [.quosures     rlang
##   c.quosures     rlang
##   print.quosures rlang
library(ggforce)
library(doMC)
## Loading required package: iterators
## Loading required package: parallel
registerDoMC(cores = max(c(detectCores() - 2L, 1L)))
options(datatable.print.topn  = 3L,
        datatable.print.nrows = 3L)

1 Elastic Net Regression

Elastic Net Regression (Friedman, Hastie, and Tibshirani 2010) is a penalized linear modeling approach that is a mixture of ridge regression (Hoerl and Kennard 1970), and least absolute shrinkage and selection operator (LASSO) regression (Tibshirani 1996). Ridge regression reduces the impact of collinearity on model parameters and LASSO reduces the dimensionality of the support by shrinking some of the regression coefficients to zero. Elastic net does both of these by solving the following equation (for Gaussian responses): \[\min_{\beta_0, \beta \in \mathbb{R}^{p+1}} \frac{1}{2N} \sum_{i = 1}^{N} \left( y_i - \beta_0 - x_i^T \beta \right)^2 + \lambda \left[ \left(1 - \alpha \right) \frac{\left \lVert \beta \right \rVert_{2}^{2}}{2} + \alpha \left \lVert \beta \right \rVert_{1} \right],\] where \(\lambda \geq 0\) is the complexity parameter and \(0 \leq \alpha \leq 1\) is the compromise between ridge \(\left(\alpha = 0\right)\) and LASSO \(\left( \alpha = 1 \right).\)

Ridge regression does not often shrink coefficients to zero and contribute to the parsimony of models. One potential benefit of elastic net regression is that, like the LASSO, it can be used to perform variable selection by shrinking coefficients to zero. Compared to LASSO, one potential benefit of elastic net regression is that it will reproducibly return the same set of non-zero coefficients when some predictors are highly correlated. LASSO, \(\alpha = 1,\) may return different sets of non-zero coefficients when highly correlated predictors are in the model.

Compared to other machine learning approaches, a potential benefit of elastic net regression is that the \(\beta\) vector is easily interpretable and can be implemented in almost any downstream computational pipeline. More flexible machine learning models such as gradient boosting machines may be able to fit data more accurately, but they are extremely difficult to export to other tools.

The cv.glmnet call from the glmnet package is widely used to fit elastic net regression models. However, the current implementation of cv.glmnet requires that the value(s) of \(\alpha\) be specified by the user (see “Details” in help("cv.glmnet")). We designed the ensr package to fill this gap by simultaneously searching for a preferable set of \(\lambda\) and \(\alpha\) values. ensr also provides additional plotting methods to facilitate visual identification of the best choice for a given project.

2 Data Sets

Two data sets are provided in the ensr package for use in examples.

  1. tbi is a synthetic data set with which traumatic brain injury can be classified into three different types using a set of predictors.

  2. landfill is a synthetic data set similar to those generated by computer models of water percolating through landfill.

More information about each of these data sets is provided in the “ensr-datasets” vignette:

vignette("ensr-datasets", package = "ensr")
data(tbi, package = "ensr")
data(landfill, package = "ensr")

3 Searching for \(\lambda\) and \(\alpha\)

The optimal pair of \(\lambda\) and \(\alpha\) is likely project-specific. Some defaults are provided but the user is encouraged to carefully consider, for example, the optimal balance between parsimony and model error for their project. For example, model error that is lower by 0.1% at the expense of 3 additional parameters may or may not be desirable.

3.1 Univariate Response

A call to ensr produces a search for a combination of \(\lambda\) and \(\alpha\) that result in the lowest cross validation error. The arguments to ensr are the same as those made to cv.glmnet with the addition of alphas, a sequence of \(\alpha\) values to use. Please note that ensr will add length(alphas) - 1 additional values, the midpoints between the given set, in the construction of a \(\lambda\)\(\alpha\) grid to search. For the initial example we will fit an elastic net for modeling the evaporation in the landfill data constructed above.

ensr_obj is a ensr object, which is a list of cv.glmnet objects. The length of the list is determined by the length of the \(\alphas\) argument. The default for alphas is seq(0, 1, length = 10).

The summary method for ensr objects returns a data.table with values of \(\lambda\), \(\alpha\), the mean cross-validation error cvm, and the number of non-zero coefficients. The l_index is the list index of the ensr object associated with the noted \(\alpha\) value.

The preferable model may be the one with the minimum cross-validation error.

A quick way to get the preferable model is to call the preferable method.

preferable returns a elnet glmnet object with one additional list element, the ensr_summary used to select this preferable model.

Because the output of preferable inherits the same class as an object returned from a call to glmnet::glmnet the same methods can be used. Plotting methods are one example:

Another graphical way to look at the results of the ensr is to use the ensr-provided plotting method. In the plot below, each of the \(\lambda\) (y-axis, \(\log_{10}\) scale) and \(\alpha\) (x-axis) values considered in the ensr_obj are plotted. The coloring is denoted as log10(z) where z = (cvm - min(cvm)) / sd(cvm). The color scale is set to have low values (values near the minimum mean cross validation error) be dark green. Values moving further from the minimum are lighter green, then white, then purple. A red cross identifies the minimum mean cross-validation error.

The ensr plot method produces a ggplot object and thus can be customized. In this example, we add the black-and-white theme and use ggforce::facet_zoom to zoom in on a section of the graphic:

In this figure we see the minimum mean cross validation error occurs within the models with \(\alpha\) = 0.6666667.

Inspection of the plot suggests there is another minimum worth considering, for \(\alpha\) = 0.8333333.

The difference in the mean cross validation error between these two results is very small and may not be meaningful. However, the number of non-zero (nzero) coefficients is quite different. With a very small increase in the mean cross validation error, one more variable has its regression coefficient shrunk to zero. If parsimony is your primary objective, the second model might be preferable.

We can also look at the mean cross validation errors by nzero.

The plot method for ensr objects has a type argument. The default is type = 1 as plotted above. type = 2 plots:

Some customization to the plot:

Based on the figure above, if the objective is lowest cross validation error and parsimony, the model with 14 non-zero coefficients may be the preferable model. The additional non-zero coefficient in the model with 15 or more non-zero coefficients does not meaningfully reduce the mean cross validation error. Further examination shows that the model with only four non-zero coefficients might also be a reasonable choice.

A quick side note: you can get both types of plots in one call:

To obtain the coefficients from the above models:

The table below shows the variables for each model in descending order of the absolute value of the regression coefficients. Because the predictors were standardized, the relative magnitude of the coefficients can serve as a sensitivity/influence/importance metric.

variable value
nzero = 4      
   1 weather_temp 1.183977e+00
   2 wind 7.230358e-01
   3 weather_solrad 5.533914e-01
   4 rh -2.613265e-01
nzero = 14      
   1 weather_temp 1.188508e+00
   2 wind 7.276644e-01
   3 weather_solrad 5.559003e-01
   4 rh -2.680920e-01
   5 weather_precip 4.449118e-03
   6 topsoil_ks -1.976958e-03
   7 wlt_thetar 1.794380e-03
   8 cn 1.299218e-03
   9 liner_pinholes 1.249752e-03
   10 clay_porosity -9.975358e-04
   11 topsoil_n 7.837412e-04
   12 ult_thetar -5.023029e-04
   13 lai 1.783396e-04
   14 rmw -9.395882e-05

The variables most important for modeling evaporation are weather_temp (average temperature over the last 100 years), wind (average wind speed), weather_solrad (average solar radiation), and rh (relative humidity). The fifth and higher coefficients are considerably smaller and less important than the first four.

3.2 Cross Validation Issues

Cross-validation results may be dependent on the foldid randomly assigned to each record. This is because some records may always be considered together in either the training or validation data sets. For example, three randomly generated vectors for 10-fold cross-validation are generated below and the results from calls to ensr are shown below.

There are small differences in the cross validation errors and in the \(\lambda\) values. There is a large difference, however, in the \(\alpha\) values. It is notable that the differences in the regression coefficients is minor. In this case, the number of non-zero coefficients does not change and the coefficient magnitudes are similar.

One could argue there is a major difference in the result between the two foldid’s. Using the cvm, foldid3 leads to 18 non-zero coefficients whereas foldid1 and foldid2 leads to only 14 non-zero coefficients.

Because of these issues, we recommend multiple cross-validation runs or bootstrapping to select a final model.

3.3 Multivariate Response

There are three outcomes, injuries, in the tbi data set. It would be reasonable to assume that there should be common variables with non-zero coefficients for models of each injury. The end user could fit three univariate models or fit one multinomial model using the tools provided by glmnet.

To illustrate these options we will run ensr five times: three univariate models, one multinomial model with type.multinomial set to the default “ungrouped,” and one grouped multinomial model.

Plots of the results show that for injury2 and injury3, an \(\alpha\) of 1 would be preferable. For injury1, a slightly lower \(\alpha\) is preferable. When fitting the multinomial responses, grouped or ungrouped, it appears that the preferable \(\alpha\) is similar to the univariate fits.

The summary model output:

The models with the lowest mean cross validation error are:

It appears that either an ungrouped model with 9 non-zero coefficients or a grouped model with 14 non-zero coefficients would be preferable.

Here are the models with the lowest mean cross validation error and eight non-zero coefficients:

Let’s look at the coefficients for these models, starting with the three univariate models

The following are the non-zero coefficients for the ungrouped models.

The grouped results are:

ensr can also analyze multivariate Gaussian responses. See the documentation help("glmnet", package = "glmnet") for details.

4 Alternative Approaches

Our ensr package is not the only approach to searching for \(\labmda\) and \(\alpha\). The glmnetUtils (Microsoft and Ooi 2017) is the most notable comparative package. We encourage the reader to explore the glmnetUtils package and determine if it meets your needs.

There are two major differences in the implementation of ensr and glmnetUtils. The first major difference is that glmnetUtils allows users to specify glmnet::glmnet models with a formula, e.g., y ~ x1 + x2 + x3, whereas ensr maintains the glmnet requirement of the user providing y and x matrices. We opted for the simplicity of staying with glmnet::glmnet arguments for programming efficiency and as a check on reasonable models. It is likely that a model with a factor on the right-hand side of the formula should not be evaluated with elastic net. There does not appear to be a check or warning in glmnetUtils for factor or character (which will be coerced to a factor) variables on the right-hand side of the formula statement. ensr and glmnet, by requiring the user to specify the response and support matrices, forces the user to be aware of and explicitly handle possible character/factor predictor variables. Binary factors are non-trivial in this context as well, but are considerably less difficult to deal with then factors with three or more possible values.

The second major difference between ensr and glmnetUtils is that ensr builds a \(\labmda\)-\(\alpha\) grid and evaluates glmnet::cv.glmnet at least twice for each value of \(\lambda.\) For each value of \(\lambda\), at least two values of \(\alpha\) will be considered. glmnetUtils only uses the default \(\labmda\) values for each specific \(\alpha\) value. By constructing a \(\labmda\)-\(\alpha\) grid, ensr provides minimally sufficient support for estimating a contour plot for a (x = \(\alpha,\) y = \(\lambda,\) z = cross-validation mean error) surface. The glmnetUtils results require imputation before such a surface could be estimated. We feel that ensr’s use of a \(\labmda\)-\(\alpha\) grid is a significant advantage relative to glmnetUtils.

See the example script compare-to-glmnetUtils.R in the examples directory on the ensr github page: https://github.com/dewittpe/ensr.

5 Session Info

print(sessionInfo(), local = FALSE)
## R Under development (unstable) (2019-01-13 r75986)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Debian GNU/Linux 9 (stretch)
## 
## Matrix products: default
## BLAS/LAPACK: /usr/lib/libopenblasp-r0.2.19.so
## 
## attached base packages:
## [1] parallel  stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] doMC_1.3.5        iterators_1.0.10  ggforce_0.1.3    
##  [4] ggplot2_3.1.0     data.table_1.12.0 ensr_0.1.0       
##  [7] glmnet_2.0-16     foreach_1.4.4     Matrix_1.2-15    
## [10] qwraps2_0.4.0     knitr_1.21       
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.0       highr_0.7        bindr_0.1.1      compiler_3.6.0  
##  [5] pillar_1.3.1     plyr_1.8.4       tools_3.6.0      digest_0.6.18   
##  [9] evaluate_0.12    tibble_2.0.1     gtable_0.2.0     lattice_0.20-38 
## [13] pkgconfig_2.0.2  rlang_0.3.1      yaml_2.2.0       xfun_0.4        
## [17] bindrcpp_0.2.2   gridExtra_2.3    withr_2.1.2      stringr_1.3.1   
## [21] dplyr_0.7.8      tidyselect_0.2.5 grid_3.6.0       glue_1.3.0      
## [25] R6_2.3.0         rmarkdown_1.11   farver_1.1.0     tweenr_1.0.1    
## [29] purrr_0.2.5      magrittr_1.5     units_0.6-2      MASS_7.3-51.1   
## [33] scales_1.0.0     codetools_0.2-16 htmltools_0.3.6  assertthat_0.2.0
## [37] colorspace_1.4-0 labeling_0.3     stringi_1.2.4    lazyeval_0.2.1  
## [41] munsell_0.5.0    crayon_1.3.4

References

Friedman, Jerome, Trevor Hastie, and Rob Tibshirani. 2010. “Regularization Paths for Generalized Linear Models via Coordinate Descent.” Journal of Statistical Software 33 (1). NIH Public Access: 1.

Hoerl, Arthur E, and Robert W Kennard. 1970. “Ridge Regression: Biased Estimation for Nonorthogonal Problems.” Technometrics 12 (1). Taylor & Francis Group: 55–67.

Microsoft, and Hong Ooi. 2017. GlmnetUtils: Utilities for ’Glmnet’. https://CRAN.R-project.org/package=glmnetUtils.

Tibshirani, Robert. 1996. “Regression Shrinkage and Selection via the Lasso.” Journal of the Royal Statistical Society. Series B (Methodological). JSTOR, 267–88.