Tuning Hyperparameters

Samuel Wilson

February 9, 2020


Package Process

Machine learning projects will commonly require a user to “tune” a model’s hyperparameters to find a good balance between bias and variance. Several tools are available in a data scientist’s toolbox to handle this task, the most blunt of which is a grid search. A grid search gauges the model performance over a pre-defined set of hyperparameters without regard for past performance. As models increase in complexity and training time, grid searches become unwieldly.

Idealy, we would use the information from prior model evaluations to guide us in our future parameter searches. This is precisely the idea behind Bayesian Optimization, in which our prior response distribution is iteratively updated based on our best guess of where the best parameters are. The ParBayesianOptimization package does exactly this in the following process:

  1. Initial parameter-score pairs are found
  2. Gaussian Process is fit/updated
  3. Numerical methods are used to estimate the best parameter set
  4. New parameter-score pairs are found
  5. Repeat steps 2-4 until some stopping criteria is met

Practical Example

In this example, we will be using the agaricus.train dataset provided in the XGBoost package. Here, we load the packages, data, and create a folds object to be used in the scoring function.

library("xgboost")
library("ParBayesianOptimization")

data(agaricus.train, package = "xgboost")

Folds <- list(
    Fold1 = as.integer(seq(1,nrow(agaricus.train$data),by = 3))
  , Fold2 = as.integer(seq(2,nrow(agaricus.train$data),by = 3))
  , Fold3 = as.integer(seq(3,nrow(agaricus.train$data),by = 3))
)

Now we need to define the scoring function. This function should, at a minimum, return a list with a Score element, which is the model evaluation metric we want to maximize. We can also retain other pieces of information created by the scoring function by including them as named elements of the returned list. In this case, we want to retain the optimal number of rounds determined by the xgb.cv:

scoringFunction <- function(max_depth, min_child_weight, subsample) {

  dtrain <- xgb.DMatrix(agaricus.train$data,label = agaricus.train$label)
  
  Pars <- list( 
      booster = "gbtree"
    , eta = 0.01
    , max_depth = max_depth
    , min_child_weight = min_child_weight
    , subsample = subsample
    , objective = "binary:logistic"
    , eval_metric = "auc"
  )

  xgbcv <- xgb.cv(
      params = Pars
    , data = dtrain
    , nround = 100
    , folds = Folds
    , prediction = TRUE
    , showsd = TRUE
    , early_stopping_rounds = 5
    , maximize = TRUE
            , verbose = 0)

  return(
    list( 
        Score = max(xgbcv$evaluation_log$test_auc_mean)
      , nrounds = xgbcv$best_iteration
    )
  )
}

Some other objects we need to define are the bounds, GP kernel and acquisition function. In this example, the kernel and acquisition function are left as the default.

bounds <- list( 
    max_depth = c(2L, 10L)
  , min_child_weight = c(1, 25)
  , subsample = c(0.25, 1)
)

We are now ready to put this all into the bayesOpt function.

set.seed(1234)
optObj <- bayesOpt(
    FUN = scoringFunction
  , bounds = bounds
  , initPoints = 4
  , iters.n = 3
)

The console informs us that the process initialized by running scoringFunction 4 times. It then fit a Gaussian process to the parameter-score pairs, found the global optimum of the acquisition function, and ran scoringFunction again. This process continued until we had 7 parameter-score pairs. You can interrogate the bayesOpt object to see the results:

optObj$scoreSummary
#>    Epoch Iteration max_depth min_child_weight subsample gpUtility acqOptimum inBounds Elapsed     Score nrounds errorMessage
#> 1:     0         1         9         5.863591 0.2585819        NA      FALSE     TRUE    0.38 0.9984373      11           NA
#> 2:     0         2         4        10.154185 0.5230172        NA      FALSE     TRUE    0.29 0.9977907       7           NA
#> 3:     0         3         6        24.487949 0.8622225        NA      FALSE     TRUE    1.26 0.9988230      52           NA
#> 4:     0         4         2        17.988070 0.6821260        NA      FALSE     TRUE    0.30 0.9876197      10           NA
#> 5:     1         5         2         7.652206 1.0000000 0.8147956       TRUE     TRUE    0.25 0.9871587       8           NA
#> 6:     2         6         9         7.992101 0.2843360 0.7111638       TRUE     TRUE    0.28 0.9977847       7           NA
#> 7:     3         7         9         1.000000 0.2500000 0.8122421       TRUE     TRUE    0.33 0.9999503       9           NA
getBestPars(optObj)
#> $max_depth
#> [1] 9
#> 
#> $min_child_weight
#> [1] 1
#> 
#> $subsample
#> [1] 0.25