Custom Estimators

The tfestimators framework makes it easy to construct and build machine learning models via its high-level Estimator API. Estimator offers classes you can instantiate to quickly configure common model types such as regressors and classifiers.

But what if none of the predefined model types meets your needs? Perhaps you need more granular control over model configuration, such as the ability to customize the loss function used for optimization, or specify different activation functions for each neural network layer. Or maybe you’re implementing a ranking or recommendation system, and neither a classifier nor a regressor is appropriate for generating predictions. The figure on the right illustrates the basic components of an estimator. Users can implement custom behaviors and or architecture inside the model_fn of the estimator.

This tutorial covers how to create your own Estimator using the building blocks provided in tfestimators package, which will predict the ages of abalones based on their physical measurements. You’ll learn how to do the following:

The complete code for this tutorial can be found here.

An Abalone Age Predictor

It’s possible to estimate the age of an abalone (sea snail) by the number of rings on its shell. However, because this task requires cutting, staining, and viewing the shell under a microscope, it’s desirable to find other measurements that can predict age.

The Abalone Data Set contains the following feature data for abalone:

Feature Description
Length Length of abalone (in longest direction; in mm)
Diameter Diameter of abalone (measurement perpendicular to length; in mm)
Height Height of abalone (with its meat inside shell; in mm)
Whole Weight Weight of entire abalone (in grams)
Shucked Weight Weight of abalone meat only (in grams)
Viscera Weight Gut weight of abalone (in grams), after bleeding
Shell Weight Weight of dried abalone shell (in grams)

Setup

This tutorial uses three data sets. abalone_train.csv contains labeled training data comprising 3,320 examples. abalone_test.csv contains labeled test data for 850 examples. abalone_predict contains 7 examples on which to make predictions.

The following sections walk through writing the Estimator code step by step; the full, final code is available here.

Downloading and Loading Abalone CSV Data

We first write a function that downloads the training, testing, and evaluation data from TensorFlow website if we haven’t downloaded them before.

library(tfestimators)

maybe_download_abalone <- function(train_data_path, test_data_path, predict_data_path, column_names_to_assign) {
  if (!file.exists(train_data_path) || !file.exists(test_data_path) || !file.exists(predict_data_path)) {
    cat("Downloading abalone data ...")
    train_data <- read.csv("http://download.tensorflow.org/data/abalone_train.csv", header = FALSE)
    test_data <- read.csv("http://download.tensorflow.org/data/abalone_test.csv", header = FALSE)
    predict_data <- read.csv("http://download.tensorflow.org/data/abalone_predict.csv", header = FALSE)
    colnames(train_data) <- column_names_to_assign
    colnames(test_data) <- column_names_to_assign
    colnames(predict_data) <- column_names_to_assign
    write.csv(train_data, train_data_path, row.names = FALSE)
    write.csv(test_data, test_data_path, row.names = FALSE)
    write.csv(predict_data, predict_data_path, row.names = FALSE)
  } else {
    train_data <- read.csv(train_data_path, header = TRUE)
    test_data <- read.csv(test_data_path, header = TRUE)
    predict_data <- read.csv(predict_data_path, header = TRUE)
  }
  return(list(train_data = train_data, test_data = test_data, predict_data = predict_data))
}

COLNAMES <- c("length", "diameter", "height", "whole_weight", "shucked_weight", "viscera_weight", "shell_weight", "num_rings")

downloaded_data <- maybe_download_abalone(
  file.path(getwd(), "train_abalone.csv"),
  file.path(getwd(), "test_abalone.csv"),
  file.path(getwd(), "predict_abalone.csv"),
  COLNAMES
)
train_data <- downloaded_data$train_data
test_data <- downloaded_data$test_data
predict_data <- downloaded_data$predict_data

Next, we construct the input function as follows:


constructed_input_fn <- function(dataset) {
  input_fn(dataset, features = -num_rings, response = num_rings, num_epochs = NULL)
}
train_input_fn <- constructed_input_fn(train_data)
test_input_fn <- constructed_input_fn(test_data)
predict_input_fn <- constructed_input_fn(predict_data)

Instantiating an Estimator

When defining a model using one of tf.estimator’s provided classes, such as linear_dnn_combined_classifier, you supply all the configuration parameters right in the constructor, e.g.:

diameter <- column_numeric("diameter")
height <- column_numeric("height")

model <- dnn_linear_combined_classifier(
  linear_feature_columns = feature_columns(diameter),
  dnn_feature_columns = feature_columns(height),
  dnn_hidden_units = c(100L, 50L)
)

You don’t need to write any further code to instruct TensorFlow how to train the model, calculate loss, or return predictions; that logic is already baked into the linear_dnn_combined_classifier.

By contrast, when you’re creating your own estimator from scratch, the constructor accepts just two high-level parameters for model configuration, model_fn and params:

model <- estimator(model_fn, params = model_params)

Note: Just like tfestimators’ predefined regressors and classifiers, the estimator initializer also accepts the general configuration arguments model_dir and config.

For the abalone age predictor, the model will accept one hyperparameter: learning rate. Here, learning_rate is set to 0.001, but you can tune this value as needed to achieve the best results during model training.

The following code creates the list model_params containing the learning rate and instantiates the Estimator:

# Set model params
model_params <- list(learning_rate = 0.001)

# Instantiate Estimator
model <- estimator(model_fn, params = model_params)

Constructing the model_fn

The basic skeleton for an Estimator API model function looks like this:

model_fn <- function(features, labels, mode, params, config) {
  # Logic to do the following:
  # 1. Configure the model via TensorFlow operations
  # 2. Define the loss function for training/evaluation
  # 3. Define the training operation/optimizer
  # 4. Generate predictions
  # 5. Return predictions/loss/train_op/eval_metric_ops in estimator_spec object
}

The model_fn must accept three arguments:

model_fn may also accept a params argument containing a dict of hyperparameters used for training (as shown in the skeleton above) and a config that represents the configurations used in a model, including GPU percentage, cluster information, etc.

The body of the function performs the following tasks (described in detail in the sections that follow):

The model_fn must return an estimator_spec object, which contains the following values:

predictions <- list(results = tensor_of_predictions)

In infer mode, the dict that you return in estimator_spec will then be returned by predict(), so you can construct it in the format in which you’d like to consume it.

eval_metric_ops <- list(accuracy = tf$metrics$accuracy(labels, predictions))

If you do not specify eval_metric_ops, only loss will be calculated during evaluation.

Configuring a neural network with feature_column and layers

Constructing a neural network entails creating and connecting the input layer, the hidden layers, and the output layer.

The input layer is a series of nodes (one for each feature in the model) that will accept the feature data that is passed to the model_fn in the features argument. If features contains an n-dimensional Tensor with all your feature data, then it can serve as the input layer. If features contains a dict of feature columns passed to the model via an input function, you can convert it to an input-layer Tensor with the input_layer function:

input_layer <- input_layer(
    features = features, feature_columns = c(age, height, weight))

As shown above, input_layer() takes two required arguments:

The input layer of the neural network then must be connected to one or more hidden layers via an activation function that performs a nonlinear transformation on the data from the previous layer. The last hidden layer is then connected to the output layer, the final layer in the model. tf$layers provides the tf$layers$dense function for constructing fully connected layers. The activation is controlled by the activation argument. Some options to pass to the activation argument are:

hidden_layer <- tf$layers$dense(inputs = input_layer, units = 10L, activation = tf$nn$relu)
second_hidden_layer <- tf$layers$dense(
inputs = hidden_layer, units = 20L, activation = tf$nn$relu)
output_layer <- tf$layers$dense(inputs = second_hidden_layer,
units = 3L, activation = NULL)

Other activation functions are possible, e.g.:

output_layer <- tf$layers$dense(inputs = second_hidden_layer,
units = 10L, activation_fn = tf$sigmoid)

The above code creates the neural network layer output_layer, which is fully connected to second_hidden_layer with a sigmoid activation function tf$sigmoid.

The network contains two hidden layers, each with 10 nodes and a ReLU activation function. The output layer contains no activation function, and is tf$reshape to a one-dimensional tensor to capture the model’s predictions, which are stored in predictions_dict.

Defining loss for the model

The estimator_spec returned by the model_fn must contain loss: a Tensor representing the loss value, which quantifies how well the model’s predictions reflect the label values during training and evaluation runs. The tf$losses module provides convenience functions for calculating loss using a variety of metrics, including:

The following example adds a definition for loss to the abalone model_fn using mean_squared_error():

loss <- tf$losses$mean_squared_error(labels, predictions)

Supplementary metrics for evaluation can be added to an eval_metric_ops dict. The following code defines an rmse metric, which calculates the root mean squared error for the model predictions. Note that the labels tensor is cast to a float64 type to match the data type of the predictions tensor, which will contain real values:

eval_metric_ops <- list(rmse = tf$metrics$root_mean_squared_error(
tf$cast(labels, tf$float64), predictions
))

Defining the training op for the model

The training op defines the optimization algorithm TensorFlow will use when fitting the model to the training data. Typically when training, the goal is to minimize loss. A simple way to create the training op is to instantiate a tf$train$Optimizer subclass and call the minimize method.

The following code defines a training op for the abalone model_fn using the loss value calculated in Defining Loss for the Model, the learning rate passed to the function in params, and the gradient descent optimizer. For global_step, the convenience function tf$train$get_global_step takes care of generating an integer variable:

optimizer <- tf$train$GradientDescentOptimizer(learning_rate = params$learning_rate)
train_op <- optimizer$minimize(loss = loss, global_step = tf$train$get_global_step())

The complete abalone model_fn

Here’s the final, complete model_fn for the abalone age predictor. The following code configures the neural network; defines loss and the training op; and returns a estimator_spec object containing mode, predictions_dict, loss, and train_op:

model_fn <- function(features, labels, mode, params, config) {
  # Connect the first hidden layer to input layer
  first_hidden_layer <- tf$layers$dense(features, 10L, activation = tf$nn$relu)
  
  # Connect the second hidden layer to first hidden layer with relu
  second_hidden_layer <- tf$layers$dense(first_hidden_layer, 10L, activation = tf$nn$relu)
  
  # Connect the output layer to second hidden layer (no activation fn)
  output_layer <- tf$layers$dense(second_hidden_layer, 1L)
  
  # Reshape output layer to 1-dim Tensor to return predictions
  predictions <- tf$reshape(output_layer, list(-1L))
  predictions_list <- list(ages = predictions)
  
  # Calculate loss using mean squared error
  loss <- tf$losses$mean_squared_error(labels, predictions)
  
  eval_metric_ops <- list(rmse = tf$metrics$root_mean_squared_error(
    tf$cast(labels, tf$float64), predictions
  ))
  
  optimizer <- tf$train$GradientDescentOptimizer(learning_rate = params$learning_rate)
  train_op <- optimizer$minimize(loss = loss, global_step = tf$train$get_global_step())
  
  return(estimator_spec(
    mode = mode,
    predictions = predictions_list,
    loss = loss,
    train_op = train_op,
    eval_metric_ops = eval_metric_ops
  ))
}

model_params <- list(learning_rate = 0.001)
model <- estimator(model_fn, params = model_params)

Running the Abalone Model

You’ve instantiated an Estimator for the abalone predictor and defined its behavior in model_fn; all that’s left to do is train, evaluate, and make predictions.

The following code fits the neural network to the training data and evaluates the model performance based on the eval_metric_ops that we have defined:

train(model, input_fn = train_input_fn, steps = 2)

evaluate(model, input_fn = test_input_fn, steps = 2)