| Type: | Package |
| Title: | Individual Conditional Expectation Plot Toolbox |
| Version: | 1.2 |
| Date: | 2026-01-11 |
| Author: | Alex Goldstein [aut],
Adam Kapelner |
| Maintainer: | Adam Kapelner <kapelner@qc.cuny.edu> |
| Description: | Implements Individual Conditional Expectation (ICE) plots, a tool for visualizing the model estimated by any supervised learning algorithm. ICE plots refine Friedman's partial dependence plot by graphing the functional relationship between the predicted response and a covariate of interest for individual observations. Specifically, ICE plots highlight the variation in the fitted values across the range of a covariate of interest, suggesting where and to what extent they may exist. |
| License: | GPL-2 | GPL-3 |
| URL: | https://github.com/kapelner/ICEbox |
| BugReports: | https://github.com/kapelner/ICEbox/issues |
| Imports: | ggplot2, checkmate, data.table, Rcpp |
| LinkingTo: | Rcpp |
| Suggests: | randomForest, MASS, testthat, rpart |
| Encoding: | UTF-8 |
| RoxygenNote: | 7.3.3 |
| NeedsCompilation: | yes |
| Packaged: | 2026-01-12 02:23:42 UTC; kapelner |
| Repository: | CRAN |
| Date/Publication: | 2026-01-12 06:00:02 UTC |
Data concerning white wine.
Description
The WhiteWine data frame has 4898 rows and 12 columns and concerns white wines from a region in Portugal. The response variable, quality, is a wine quality metric, taken to be the median preference score of three blind tasters on a scale of 1-10. The 11 covariates are physicochemical metrics of wine quality such as citric acid content, sulphates, etc.
Usage
data(WhiteWine)
Format
A data frame of 4898 cases on 12 variables.
Source
K Bache and M Lichman. UCI machine learning repository, 2013. http://archive.ics.uci.edu/ml
Lineup plot for additivity
Description
This function creates a lineup plot to assess the additivity of a predictor's effect. It uses a nonparametric bootstrap approach to generate null plots.
Usage
additivityLineup(
backfit_obj,
fitMethod,
realICE,
figs = 10,
colorvecfcn,
usecolorvecfcn_inreal = FALSE,
null_predictfcn,
...
)
Arguments
backfit_obj |
An object of class |
fitMethod |
A function that accepts |
realICE |
The |
figs |
The total number of plots in the lineup (including the real one). Default is 10. |
colorvecfcn |
Optional function to generate a color vector for the curves. |
usecolorvecfcn_inreal |
If |
null_predictfcn |
Optional prediction function for the null models. |
... |
Additional arguments passed to |
Value
An object of class additivityLineup (invisibly).
Backfitting for Additive Models
Description
Fits a model of the form \hat{f}(x) = \hat{g}_{1}(x_S) + \hat{g}_{2}(x_C) using backfitting.
Usage
backfitter(
X,
y,
predictor,
fitMethod,
predictfcn,
eps = 0.01,
iter.max = 10,
verbose = TRUE,
...
)
Arguments
X |
The design matrix. |
y |
The response vector. |
predictor |
The name or index of the predictor of interest ( |
fitMethod |
A function that accepts |
predictfcn |
A function that accepts |
eps |
Convergence threshold. |
iter.max |
Maximum number of iterations. |
verbose |
If |
... |
Additional arguments passed to |
Value
An object of class backfitter.
Clustering of ICE and d-ICE curves by kmeans.
Description
Clustering if ICE and d-ICE curves by kmeans. All curves are centered to have mean 0 and then kmeans is applied to the curves with the specified number of clusters.
Usage
clusterICE(
ice_obj,
nClusters,
plot = TRUE,
plot_margin = 0.05,
colorvec,
plot_pdp = FALSE,
x_quantile = FALSE,
avg_lwd = 3,
centered = FALSE,
plot_legend = FALSE,
main = NULL,
num_cores = 1,
...
)
Arguments
ice_obj |
Object of class |
nClusters |
Number of clusters to find. |
plot |
If |
plot_margin |
Extra margin to pass to |
colorvec |
Optional vector of colors to use for each cluster. |
plot_pdp |
If |
x_quantile |
If |
avg_lwd |
Average line width to use when plotting the cluster means. Line width is proportional to the cluster's size. |
centered |
If |
plot_legend |
If |
main |
Optional title for the plot. |
num_cores |
Integer number of cores to use for parallel operations. Default is 1. |
... |
Additional arguments for plotting. |
Value
A list with the following elements:
cl |
The output of the |
plot |
The ggplot object used for plotting (if |
See Also
Examples
## Not run:
require(ICEbox)
require(randomForest)
require(MASS) #has Boston Housing data, Pima
data(Boston) #Boston Housing data
X = Boston
y = X$medv
X$medv = NULL
## build a RF:
bh_rf = randomForest(X, y)
## Create an 'ice' object for the predictor "age":
bh.ice = ice(object = bh_rf, X = X, y = y, predictor = "age",
frac_to_build = .1)
## cluster the curves into 2 groups.
clusterICE(bh.ice, nClusters = 2, plot_legend = TRUE)
## cluster the curves into 3 groups, start all at 0.
clusterICE(bh.ice, nClusters = 3, plot_legend = TRUE, center = TRUE)
## End(Not run)
Efficient Column Standard Deviations
Description
Efficient Column Standard Deviations
Usage
colSds_cpp(x, n_cores = 1L)
Arguments
x |
Numeric Matrix |
n_cores |
Number of cores to use |
Efficient Numerical Derivative for Matrix (Row-wise)
Description
Computes the first derivative using centered differences, mirroring sfsmisc::D1tr.
Usage
derivative_cpp(x, gridpts, n_cores = 1L)
Arguments
x |
Numeric Matrix (smoothed values) |
gridpts |
Grid points corresponding to columns of x |
n_cores |
Number of cores to use |
Creates an object of class dice.
Description
Estimates the partial derivative function for each curve in an ice object.
See Goldstein et al (2013) for further details.
Usage
dice(
ice_obj,
DerivEstimator = NULL,
use_supsmu = FALSE,
verbose = TRUE,
num_cores = 1,
sg_poly_order = 2,
sg_window_size = NULL
)
Arguments
ice_obj |
Object of class |
DerivEstimator |
Optional function with a single argument |
use_supsmu |
If |
verbose |
If |
num_cores |
Integer number of cores to use for parallel derivative estimation. Defaults to 1. |
sg_poly_order |
Polynomial order for Savitzky-Golay filter. Default is 2. |
sg_window_size |
Window size for Savitzky-Golay filter. Default is 30% of the grid. |
Value
A list of class dice with the following elements. Most are passed directly through
from ice_object and exist to enable various plotting facilities.
d_ice_curves |
Matrix of dimension |
xj |
The actual values of |
actual_deriv |
Vector of length |
sd_deriv |
Vector of length |
logodds |
Passed from |
gridpts |
Passed from |
predictor |
Passed from |
xlab |
Passed from |
nominal_axis |
Passed from |
range_y |
Passed from |
Xice |
Passed from |
dpdp |
The estimated partial derivative of the PDP. |
References
Goldstein, A., Kapelner, A., Bleich, J., and Pitkin, E., Peeking
Inside the Black Box: Visualizing Statistical Learning With Plots of
Individual Conditional Expectation. (2014) Journal of Computational
and Graphical Statistics, in press
See Also
Examples
## Not run:
# same examples as for 'ice', but now create a derivative estimate as well.
require(ICEbox)
require(randomForest)
require(MASS) #has Boston Housing data, Pima
######## regression example
data(Boston) #Boston Housing data
X = Boston
y = X$medv
X$medv = NULL
## build a RF:
bhd_rf_mod = randomForest(X, y)
## Create an 'ice' object for the predictor "age":
bhd.ice = ice(object = bhd_rf_mod, X = X, y = y, predictor = "age", frac_to_build = .1)
# make a dice object:
bhd.dice = dice(bhd.ice)
#### classification example
data(Pima.te) #Pima Indians diabetes classification
y = Pima.te$type
X = Pima.te
X$type = NULL
## build a RF:
pima_rf = randomForest(x = X, y = y)
## Create an 'ice' object for the predictor "skin":
# For classification we plot the centered log-odds. If we pass a predict
# function that returns fitted probabilities, setting logodds = TRUE instructs
# the function to set each ice curve to the centered log-odds of the fitted
# probability.
pima.ice = ice(object = pima_rf, X = X, predictor = "skin", logodds = TRUE,
predictfcn = function(object, newdata){
predict(object, newdata, type = "prob")[, 2]
}
)
# make a dice object:
pima.dice = dice(pima.ice)
## End(Not run)
Creates an object of class ice.
Description
Creates an ice object with individual conditional expectation curves
for the passed model object, X matrix, predictor, and response. See
Goldstein et al (2013) for further details.
Usage
ice(
object,
X,
y,
predictor,
predictfcn,
verbose = TRUE,
frac_to_build = 1,
indices_to_build = NULL,
num_grid_pts,
logodds = FALSE,
probit = FALSE,
num_cores = 1,
...
)
Arguments
object |
The fitted model to estimate ICE curves for. |
X |
The design matrix we wish to estimate ICE curves for. Rows are observations, columns are
predictors. Typically this is taken to be |
y |
Optional vector of the response values |
predictor |
The column number or variable name in |
predictfcn |
Optional function that accepts two arguments, |
verbose |
If |
frac_to_build |
Number between 0 and 1, with 1 as default. For large |
indices_to_build |
Vector of indices, |
num_grid_pts |
Optional number of values in the range of |
logodds |
If |
probit |
If |
num_cores |
Integer number of cores to use for parallel prediction. Defaults to 1. |
... |
Other arguments to be passed to |
Value
A list of class ice with the following elements:
gridpts |
Sorted values of |
ice_curves |
Matrix of dimension |
xj |
The actual values of |
actual_prediction |
Vector of length |
xlab |
String with the predictor name corresponding to |
nominal_axis |
If |
range_y |
If |
sd_y |
If |
Xice |
A matrix containing the subset of |
pdp |
A vector of size |
predictor |
Same as the argument, see argument description. |
logodds |
Same as the argument, see argument description. |
indices_to_build |
Same as the argument, see argument description. |
frac_to_build |
Same as the argument, see argument description. |
predictfcn |
Same as the argument, see argument description. |
References
Jerome Friedman. Greedy Function Approximation: A Gradient Boosting Machine. The Annals of Statistics, 29(5): 1189-1232, 2001.
Goldstein, A., Kapelner, A., Bleich, J., and Pitkin, E., Peeking Inside the Black Box: Visualizing Statistical Learning With Plots of Individual Conditional Expectation. (2014) Journal of Computational and Graphical Statistics, in press
See Also
plot.ice, print.ice, summary.ice
Examples
## Not run:
require(ICEbox)
require(randomForest)
require(MASS) #has Boston Housing data, Pima
######## regression example
data(Boston) #Boston Housing data
X = Boston
y = X$medv
X$medv = NULL
## build a RF:
bhd_rf_mod = randomForest(X, y)
## Create an 'ice' object for the predictor "age":
bhd.ice = ice(object = bhd_rf_mod, X = X, y = y, predictor = "age", frac_to_build = .1)
## End(Not run)
Melt Matrix to Long Format Vector
Description
Efficiently converts a matrix to a long-format vector (row-major order) for plotting.
Usage
melt_ice_curves_cpp(x, n_cores = 1L)
Arguments
x |
Numeric Matrix |
n_cores |
Number of cores to use |
Create a plot of a dice object.
Description
Plotting of dice objects.
Usage
## S3 method for class 'dice'
plot(
x,
plot_margin = 0.05,
frac_to_plot = 1,
plot_sd = TRUE,
plot_orig_pts_deriv = TRUE,
pts_preds_size = 1.5,
colorvec,
color_by = NULL,
x_quantile = TRUE,
plot_dpdp = TRUE,
rug_quantile = seq(from = 0, to = 1, by = 0.1),
verbose = TRUE,
...
)
Arguments
x |
Object of class |
plot_margin |
Extra margin to pass to |
frac_to_plot |
If |
plot_sd |
If |
plot_orig_pts_deriv |
If |
pts_preds_size |
Size of points to make if |
colorvec |
Optional vector of colors to use for each curve. |
color_by |
Optional variable name (or column number) in |
x_quantile |
If |
plot_dpdp |
If |
rug_quantile |
If not null, tick marks are drawn on the x-axis corresponding to the vector of quantiles specified by this parameter.
Forced to |
verbose |
If |
... |
Additional plotting arguments. |
Value
A list with the following elements.
plot_points_indices |
Row numbers of |
legend_text |
If the |
plot |
The ggplot object used for plotting. |
See Also
Examples
## Not run:
require(ICEbox)
require(randomForest)
require(MASS) #has Boston Housing data, Pima
data(Boston) #Boston Housing data
X = Boston
y = X$medv
X$medv = NULL
## build a RF:
bhd_rf_mod = randomForest(X, y)
## Create an 'ice' object for the predictor "age":
bhd.ice = ice(object = bhd_rf_mod, X = X, y = y, predictor = "age", frac_to_build = .1)
# estimate derivatives, then plot.
bhd.dice = dice(bhd.ice)
plot(bhd.dice)
## End(Not run)
Plotting of ice objects.
Description
Plotting of ice objects.
Usage
## S3 method for class 'ice'
plot(
x,
plot_margin = 0.05,
frac_to_plot = 1,
plot_points_indices = NULL,
plot_orig_pts_preds = TRUE,
pts_preds_size = 1.5,
colorvec,
color_by = NULL,
x_quantile = TRUE,
plot_pdp = TRUE,
centered = FALSE,
prop_range_y = TRUE,
rug_quantile = seq(from = 0, to = 1, by = 0.1),
centered_percentile = 0,
point_labels = NULL,
point_labels_size = NULL,
prop_type = "sd",
verbose = TRUE,
num_cores = 1,
...
)
Arguments
x |
Object of class |
plot_margin |
Extra margin to pass to |
frac_to_plot |
If |
plot_points_indices |
If not |
plot_orig_pts_preds |
If |
pts_preds_size |
Size of points to make if |
colorvec |
Optional vector of colors to use for each curve. |
color_by |
Optional variable name in |
x_quantile |
If |
plot_pdp |
If |
centered |
If |
prop_range_y |
When |
rug_quantile |
If not |
centered_percentile |
The percentile of |
point_labels |
If not |
point_labels_size |
If not |
prop_type |
Scaling factor for the right vertical axis in centered plots if |
verbose |
If |
num_cores |
Used for parallel plotting speedup. Default is 1. |
... |
Other arguments to be passed to the |
Value
A list with the following elements.
plot_points_indices |
Row numbers of |
legend_text |
If the |
See Also
Examples
## Not run:
require(ICEbox)
require(randomForest)
require(MASS) #has Boston Housing data, Pima
data(Boston) #Boston Housing data
X = Boston
y = X$medv
X$medv = NULL
## build a RF:
bhd_rf_mod = randomForest(X, y)
## Create an 'ice' object for the predictor "age":
bhd.ice = ice(object = bhd_rf_mod, X = X, y = y, predictor = "age",
frac_to_build = .1)
## plot
plot(bhd.ice, x_quantile = TRUE, plot_pdp = TRUE, frac_to_plot = 1)
## centered plot
plot(bhd.ice, x_quantile = TRUE, plot_pdp = TRUE, frac_to_plot = 1,
centered = TRUE)
## color the curves by high and low values of 'rm'.
# First create an indicator variable which is 1 if the number of
# rooms is greater than the median:
median_rm = median(X$rm)
bhd.ice$Xice$I_rm = ifelse(bhd.ice$Xice$rm > median_rm, 1, 0)
plot(bhd.ice, frac_to_plot = 1, centered = TRUE, prop_range_y = TRUE,
x_quantile = T, plot_orig_pts_preds = T, color_by = "I_rm")
bhd.ice = ice(object = bhd_rf_mod, X = X, y = y, predictor = "age",
frac_to_build = 1)
plot(bhd.ice, frac_to_plot = 1, centered = TRUE, prop_range_y = TRUE,
x_quantile = T, plot_orig_pts_preds = T, color_by = y)
## End(Not run)
Print method for dice objects.
Description
Prints a summary of a dice object.
Usage
## S3 method for class 'dice'
print(x, ...)
Arguments
x |
Object of class |
... |
Ignored for now. |
Print method for ice objects.
Description
Prints a summary of an ice object.
Usage
## S3 method for class 'ice'
print(x, ...)
Arguments
x |
Object of class |
... |
Ignored for now. |
Row-wise Centering
Description
Centers each row of a matrix by subtracting the row mean.
Usage
rowCenter_cpp(x, n_cores = 1L)
Arguments
x |
Numeric Matrix |
n_cores |
Number of cores to use |
Savitzky-Golay Filter for Matrix (Row-wise)
Description
Smooths each row of a matrix using a Savitzky-Golay filter.
Usage
sg_smooth_cpp(x, window_size, order, deriv, n_cores = 1L)
Arguments
x |
Matrix to smooth row-wise |
window_size |
Size of the filter window (must be odd) |
order |
Polynomial order |
deriv |
Derivative order (0=smooth, 1=first deriv, etc.) |
n_cores |
Number of cores to use |
Summary function for dice objects.
Description
Alias of print method.
Usage
## S3 method for class 'dice'
summary(object, ...)
Arguments
object |
Object of class |
... |
Ignored for now. |
Summary function for ice objects.
Description
Alias of print method.
Usage
## S3 method for class 'ice'
summary(object, ...)
Arguments
object |
Object of class |
... |
Ignored for now. |
Probability Transformation
Description
Efficiently applies logodds or probit transformation to a matrix.
Usage
transform_ice_curves_cpp(x, method, n_cores = 1L)
Arguments
x |
Numeric Matrix (probabilities) |
method |
1 for centered logodds, 2 for probit |
n_cores |
Number of cores to use |