IMABC

Installation

You can install the development version from GitHub with:

# install.packages("devtools")
# devtools::install_github("carolyner/imabc")
# library(imabc)

If working on the code, with the repo as your working directory, you can also use:

# If working directory is the imabc directory you can use:
# devtools::load_all()
# or if working somewhere else you can put the path to the imabc directory
# devtools::load_all("imabc directory path")

There is no need to point load_all() to the R folder within imabc as R will recognize it as a package.

Other packages needed are: * MASS * data.table * tidyverse * parallel * foreach * doParallel * lhs * truncnorm

Running the Code

In the dev directory is an R script simple_setup.R that goes through a basic example of the code but below will be a brief explanation of the components as well.

The primary function is imabc(). The inputs and their descriptions are as follows:

Some additional notes:

priors

There are two functions to help the user define the appropriate priors list.

add_prior(): Using the function the user defines all the information associated with a specific parameter that is being calibrated. For the current version of the code the user should specify at least the following things:

define_priors(): this is a wrapper function that takes one or more priors generated by calls to add_prior() and moves some of the information returned by add_prior() around so that the code is a little bit more efficient. The only inputs into this function are objects created by add_prior() calls. This is required to set up the prior distribution functions and every parameter in the model must have a prior defined with this function. If the user doesn’t provide a name to add_prior(), they can still name the parameter by setting the parameter name equal to the add_prior() call within the define_priors() call (e.g. define_priors(name_of_parm = add_prior())). Can also create a target object (without FUN defined for the targets) from the meta data returned by the imabce function.

If no name is provided between add_prior or define_priors, a unique one will be generated following the format V[0-9]+. In the target function(s), the user can call the parameters using the names that are set for each parameter or they can use the order they were created in. See below for details.

# If dist_base_name, density_fn, and quantile_fn are NULL, parameter will be fixed
# If not provided by user, min/max default to the min/max defaults of the density function if they exist. If they do not
#   exist as defaults in the density function they then default to -Inf/Inf respectively.
# Names can be specified via:
#   define_priors(name = add_prior(...))
#   define_priors(add_prior(parameter = name, ...))
# Names must be unique
# If both name specifications are used for a given prior (e.g. define_priors(name1 = add_prior(parameter = name2, ...)))
#   then the name specified with add_prior (in this case, name2) will be used over the other method
# define_priors will assign a unique name, of the format V[0-9]+, for any parameter that isn't given a name.
# The prior distribution information is returned in the same order they are input and can be called without their names
#   using their index (e.g. priors[[1]]$density_function)
priors <- define_priors(
  x1 = add_prior(
    density_fn = "unif",
    min = 0.1, max = 0.9 # Would otherwise default to 0/1
  ),
  add_prior(
    parameter_name = "x2",
    quantile_fn = "truncnorm",
    mean = 0.75, sd = 0.05, a = 0.6, b = 0.9, # Inputs into *truncnorm
    min = 0.6, max = 1.1 # Would otherwise default to -Inf/Inf
  ),
  add_prior(
    dist_base_name = "norm",
  ) # Given the name V3, min/max default to -Inf/Inf
)

targets

Just like the prior distribution information the user has the following functions for helping them specify the target values and structure of the targets.

add_target(): This lets the user define a target. Each target must be a list of the following values: target, starting_range, stopping_range. starting_range[1] <= stopping_range[1] <= target <= stopping_range[2] <= starting_range[2] must be satisfied by these values. Optionally the user can give the target a name using target_name and a target function using FUN. See below for details.

group_targets(): This lets the user group a set of targets that are to be calibrated together. As the calibration improves its estimation of the best parameter space, accepcted target values will be restricted to tighter ranges (up until the range is identical to the stopping_range). In a group of targets, the ranges are tightened together according to the improvement of the least improved target (e.g. if you have two targets in a group and a set of parameters generate values all within the stopping range of one target but whose values could only improve the acceptable range of the second target by 10%, then both targets will have their acceptable ranges reduced by 10%). Optionally the user can give the target group a name using group_name.

define_targets(): Takes the collection of individual targets and target groups and organizes the information so that imabc can use the information correctly and efficiently. Just like the prior functions, names can be assigned when inserting the lower target object into the higher target function (e.g. define_targets(name1 = group_targets(name2 = add_target()))). Can also create a target object (without FUN defined for the targets) from the meta data returned by the imabc function.

For tracking purposes, even if targets aren’t grouped they are implicitly given a group. This is important if the user wants to reference the values in the target function or in the results. If not defined by the user, group names will follow the format G[0-9]+ and target names will follow the format T[0-9]+. The final target ID is then returned as _. The user can also reference the names using the order that the targets are added in

as.targets(): This can convert a data frame of just target information to a target list (without FUN defined for the targets). The data frame must have columns for group names, target names, target value, start min, start max, stop min, stop max. There are default values for what these columns should be called but the user can also select which columns go with which value.

For example, the user can specify something like:

# If no name is provided, function assigns name based on group and target.
#   Groups are of the format G[0-9]+
#   sub-targets are of the format T[0-9]+
# So the below generates two targets with IDs G1_T1 and G2_T1.
#   if the group_targets had two targets in it the second one would be G1_T2 and so on.
# There are checks to make sure the name is completely unique
# If names are provided then the target ID is just equal to sprintf(%s_%s, group_target name, add_target name)
# FUN (optional) can be defined as a function with only single input or as a function whose inputs are named for parameters
#   defined by the priors object. In the former, a named vector is passed to the function and the user can either use the
#   parameter names (e.g. x["V3"]) or the index (e.g. x[3]) to manipulate the simulated draws into an expected target value.
#   The parameters are placed into the input vector in the same order they are defined by the user when building a priors
#   object. If the inputs are named for parameters created in the priors object the function will only pass parameters
#   that share a name with the input (i.e. if the priors object defines V1, V2, and V3 and a target function is defined
#   as function(V1, V4) {...} the function will produce an error, as function(V1, V2) {V3*V2} the function will produce
#   an error, but function(V1, V2) {V1*V2} will work just fine).
#   regardless of the inputs the user defines, the functions will also have access to the seed and iteration current lower
#   and upper bounds imabc has deemed as acceptable. these must be added as inputs to the target function as seed,
#   lower_bound, and upper_bound. Use build_target_function() to create the final target function.
targets <- define_targets(
  group_targets(
    group_name = "G1",
    T1 = add_target(
      target = 1.5,
      starting_range = c(1.0, 2.0),
      stopping_range = c(1.49, 1.51)
    )
  ),
  add_target(
    target = 0.5,
    starting_range = c(0.2, 0.9),
    stopping_range = c(0.49, 0.51)
  )
)
# df <- data.frame(
#   group = c("G1", "G2", "G1"), name = c("T1", "T1", "T2"), target = c(1.5, 0.5, -1.5),
#   lower_bounds_start = c(1, 0.2, -2), upper_bounds_start = c(2, 0.9, -1),
#   lower_bounds_stop = c(1.49, 0.49, -1.51), upper_bounds_stop = c(1.51, 0.51, -1.49)
# )
# ex <- as.targets(df)

target_fun

define_target_function(): A helper function that lets the user create a function that can be mapped to the appropriate simulated parameters. Requires the target object created by define_targets() and the prior object created by define_priors(). If the user defined the target functions using add_target() this can create a function that will apply the inputs to all the target functions and return the vector of results. If the user doesn’t use add_target() to create individual target functions the user can create their own target function and insert it into this function using the FUN option. The last option the user can set is use_seed. imabc generates a random set of seed values for each simulated set of parameters. This lets the user control the RNG within each analysis of the target function(s) and thus be able to reproduce any set of results after the run has finished. In order to set the seed during the execution of the final target function, set use_seed = TRUE.

One feature of imabc is that the user can use three pieces of information from the simulation outside of the parameter values themselves. The first is a unique seed for each evaluation of the target function that was just discussed, this can be used using the use_seed option. The second two values are the current min/max values targets can fall within at a particular iteration of the calibration. As the calibration gets better and picking parameters, it will tighten the bounds on the targets in order to make sure only “good” parameters are used for generating alternative parameters to try. When target functions require heavy computation it can be advantageous to use the target range values to fail out of a given test early. To utilize these, the user can add inputs for them in their target function(s). If the user is creating target functions individually using add_target(), the user should use the input names lower_bound and upper_bound (e.g. ). If the user is creating a single target function using define_target_function() they would use the plural form, lower_bounds and upper_bounds. When add_target() is used, imabc will map the appropriate bounds to the function when running the result. When the target function is defined with define_target_function(FUN = …) the user must take note of the target IDs or order in which they were defined.

When defining a single target function with define_target_function() the user can define the function anyway they prefer but the returned result must a vector whose length is equal to the number of targets defined by the define_targets() function. Just like the lower and upper bounds the user can either place the results into the vector in the order they are defined or they can use use the target IDs to assign values.

For example, the user can specify something like:

# Target functions
fn1 <- function(x1, x2) { x1 + x2 + rnorm(1, 0, 0.01) }
fn2 <- function(x1, x2, lower_bound) { max(lower_bound, x1 * x2 + rnorm(1, 0, 0.01)) }
fn <- function(x1, x2, lower_bounds) {
  res <- c()

  res["G2_T1"] <- fn2(x1, x2, lower_bound[2])
  # OR
  # res[2] <- fn2(x1, x2, lower_bound["G2_T1"])
  # lower/upper bounds are now accessible in target_fun.
  # Either using the name of sim parm or the order they are created in define_targets
  res["G1_T1"] <- fn1(x1, x2)

  return(res)
}

target_fun <- define_target_function(targets, priors, FUN = fn, use_seed = FALSE)

To do