The futurize package allows you to easily turn sequential code
into parallel code by piping the sequential code to the futurize()
function. Easy!
library(futurize)
plan(multisession)
library(pls)
data(yarn)
m <- plsr(density ~ NIR, ncomp = 10, data = yarn, validation = "CV") |> futurize()
This vignette demonstrates how to use this approach to parallelize pls
functions such as mvr(), plsr(), pcr(), and crossval().
The pls package provides Partial Least Squares Regression (PLSR) and Principal Component Regression (PCR) methods. These methods often use cross-validation (CV) to determine the number of components to use, which can be computationally intensive and is an ideal candidate for parallelization.
The plsr() function is used to perform PLS regression. When
validation = "CV" is specified, it performs cross-validation.
library(pls)
data(yarn)
## Sequential evaluation
m <- plsr(density ~ NIR, ncomp = 10, data = yarn, validation = "CV")
To make it evaluate in parallel, simply pipe the call to futurize():
library(futurize)
library(pls)
data(yarn)
## Parallel evaluation
m <- plsr(density ~ NIR, ncomp = 10, data = yarn, validation = "CV") |> futurize()
This will automatically use the parallel backend set by plan(), e.g.
plan(multisession)
The crossval() function can be used to perform cross-validation on
an already fitted model:
library(futurize)
plan(multisession)
library(pls)
data(yarn)
m1 <- plsr(density ~ NIR, ncomp = 10, data = yarn)
## Parallel cross-validation
m_cv <- crossval(m1, segments = 10) |> futurize()
The following pls functions are supported by futurize():
mvr()plsr()pcr()cppls()crossval() with seed = TRUE as the defaultFor comparison, here is what it takes to parallelize pls functions
using the parallel package directly, without futurize:
library(pls)
library(parallel)
## Set up a cluster
ncpus <- 4L
cl <- makeCluster(ncpus)
## Configure pls to use the cluster
old_opts <- pls.options(parallel = cl)
## Run regression with cross-validation
data(yarn)
m <- plsr(density ~ NIR, ncomp = 10, data = yarn, validation = "CV")
## Restore original options and stop the cluster
pls.options(old_opts)
stopCluster(cl)
This requires you to manually manage the cluster lifecycle and the
global pls.options(). With futurize, the cluster setup and
option management are handled automatically and localized to the
function call.