Introduction to ensembleR

Saurav Kaushik

2016-09-25

ensembleR is a package that performs ensemble modelling in R using caret package.

This package lets you create millions of unique ensembles using 200+ models in caret package in a single line of code. Install ensembleR package using :

r install.packages("ensembleR", dependencies = c("Imports", "Suggests"))

This package creates two layeres of machine learning models to form an ensemble. In the first layer, there is any number of models between 1 to 230 specified under caret package which forms the base models layer. The second layer consists of a single model from the 230 models specified in caret package, which forms the top model.

One thing to be noted in mind is that the training and testing dataset must contain same columns, preferably, create and initialise the outcome variable in testing set with some random variable.

The function in ensembleR package used for creating ensembles is :

r predictions<-ensemble(training,testing,outcomeName,BaseModels,TopModel)

The parameters used are explained in order as :

  1. training : a dataset containing the outcome varable with all other independent variables.

  2. testing : a dataset containing all the independent variable columns as well as outcome variable column with NULL values i.e. must have same columns as the training dataset

  3. outcomeName : the column name associated with outcome variable in the training dataset.

  4. BaseModels : character string vector containing the names of all base models as in ‘caret package’ desired to be used for ensembling.

  5. TopModel : name of the model as in ‘caret package’ that is wished to be used on top.

  6. predictions : the estimated outcome variables column for testing dataset.

Let us consider an example for demonstration purposes using trees dataset :

r str(iris)

We will predict the Species using all other columns for last 25 entries by creating models ensemble for first 125 entries.

library(ensembleR)

preds<-ensemble(iris[1:125,],iris[126:150,],'Species',c('rpart','treebag','gbm','knn','svmLinear'),'rf')

table(preds)

plot(preds)