Abstract

The BayesianTools (BT) package supports model analysis (including sensitivity analysis and uncertainty analysis), Bayesian model calibration, as well as model selection and multi-model inference techniques for system models.

Quick start

The purpose of this first section is to give you a quick overview of the most important functions of the BayesianTools (BT) package. For a more detailed description, see the later sections

Installing, loading and citing the package

If you haven’t installed the package yet, either run

install.packages("BayesianTools")

Or follow the instructions on https://github.com/florianhartig/BayesianTools to install a development or an older version.

Loading and citation

library(BayesianTools)
citation("BayesianTools")

## 
## To cite package 'BayesianTools' in publications use:
## 
##   Hartig F, Minunno F, Paul S (2023). _BayesianTools: General-Purpose
##   MCMC and SMC Samplers and Tools for Bayesian Statistics_. R package
##   version 0.1.8, <https://github.com/florianhartig/BayesianTools>.
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {BayesianTools: General-Purpose MCMC and SMC Samplers and Tools for Bayesian Statistics},
##     author = {Florian Hartig and Francesco Minunno and Stefan { Paul}},
##     year = {2023},
##     note = {R package version 0.1.8},
##     url = {https://github.com/florianhartig/BayesianTools},
##   }

Note: BayesianTools calls a number of secondary packages. Particular important is coda, which is used on a number of plots and summary statistics. If you make heavy use of the summary statistics and diagnostics plots, it would be nice to cite coda as well!

Pro-tip: if you are running a stochastic algorithms such as an MCMC, you should always set or record your random seed to make your results reproducible (otherwise, results will change slightly every time you run the code)

set.seed(123)

In a real application, to ensure reproducibility, it would also be useful to record the session info,

sessionInfo()

which lists the version number of R and all loaded packages.

The Bayesian Setup

The central object in the BT package is the BayesianSetup. This class contains the information about the model to be fit (likelihood), and the priors for the model parameters.

A BayesianSetup is created by the createBayesianSetup function. The function expects a log-likelihood and (optional) a log-prior. It then automatically creates the posterior and various convenience functions for the samplers.

Advantages of the BayesianSetup include 1. support for automatic parallelization 2. functions are wrapped in try-catch statements to avoid crashes during long MCMC evaluations 3. and the posterior checks if the parameter is outside the prior first, in which case the likelihood is not evaluated (makes the algorithms faster for slow likelihoods).

If no prior information is provided, an unbounded flat prior is created. If no explicit prior, but lower and upper values are provided, a standard uniform prior with the respective bounds is created, including the option to sample from this prior, which is useful for SMC and also for getting starting values. This option is used in the following example, which creates a multivariate normal likelihood density and a uniform prior for 3 parameters.

ll <- generateTestDensityMultiNormal(sigma = "no correlation")
bayesianSetup = createBayesianSetup(likelihood = ll, lower = rep(-10, 3), upper = rep(10, 3))

See later more detailed description about the BayesianSetup.

Hint: for an example how to run this steps for dynamic ecological model, see ?VSEM

Running MCMC and SMC functions

Once you have your setup, you may want to run a calibration. The runMCMC function is the main wrapper for all other implemented MCMC/SMC functions. It always takes the following arguments

A bayesianSetup (alternatively, the log target function)
The sampler name
A list with settings - if a parameter is not provided, the default will be used

As an example, choosing the sampler name “Metropolis” calls a versatile Metropolis-type MCMC with options for covariance adaptation, delayed rejection, tempering and Metropolis-within-Gibbs sampling. For details, see the the later reference on MCMC samplers. This is how we would call this sampler with default settings

iter = 10000
settings = list(iterations = iter, message = FALSE)
out <- runMCMC(bayesianSetup = bayesianSetup, sampler = "Metropolis", settings = settings)

Summarizing outputs

All samplers can be plotted and summarized via the console with the standard print, and summary commands

print(out)

## [1] "mcmcSampler - you can use the following methods to summarize, plot or reduce this class:"
## [1] getSample plot      print     summary  
## see '?methods' for accessing help and source code

summary(out)

## # # # # # # # # # # # # # # # # # # # # # # # # # 
## ## MCMC chain summary ## 
## # # # # # # # # # # # # # # # # # # # # # # # # # 
##  
## # MCMC sampler:  Metropolis 
## # Nr. Chains:  1 
## # Iterations per chain:  10000 
## # Rejection rate:  0.684 
## # Effective sample size:  1005 
## # Runtime:  2.883  sec. 
##  
## # Parameters
##       MAP   2.5% median 97.5%
## par 1   0 -1.991  0.016 2.034
## par 2   0 -1.931 -0.077 1.802
## par 3   0 -1.970 -0.007 1.850
## 
## ## DIC:  11.153 
## ## Convergence 
##  Gelman Rubin multivariate psrf:  Only one chain; convergence cannot be determined! 
##  
## ## Correlations 
##        par 1  par 2  par 3
## par 1  1.000 -0.019 -0.029
## par 2 -0.019  1.000  0.011
## par 3 -0.029  0.011  1.000

and plottted with several plot functions. The marginalPlot can either be plotted as histograms with density overlay, which is also the default, or as a violin plot (see “?marginalPlot”).

plot(out) # plot internally calls tracePlot(out)

correlationPlot(out)

marginalPlot(out, prior = TRUE)

Other Functions that can be applied to all samplers include model selection scores such as the DIC and the marginal Likelihood (for the calculation of the Bayes factor, see later section for more details), and the Maximum Aposteriori Value (MAP). For the marginal likelihood calculation it is possible to chose from a set of methods (see “?marginalLikelihood”).

marginalLikelihood(out)

## $ln.ML
## [1] -9.07554
## 
## $ln.lik.star
## [1] -2.756816
## 
## $ln.pi.star
## [1] -8.987197
## 
## $ln.pi.hat
## [1] -2.668472
## 
## $method
## [1] "Chib"

DIC(out)

## $DIC
## [1] 11.15333
## 
## $IC
## [1] 13.96971
## 
## $pD
## [1] 2.816389
## 
## $pV
## [1] 2.8407
## 
## $Dbar
## [1] 8.336937
## 
## $Dhat
## [1] 5.520548

MAP(out)

## $parametersMAP
##         par 1         par 2         par 3 
##  4.650153e-04 -1.707961e-04 -6.534865e-05 
## 
## $valuesMAP
##  Lposterior Llikelihood      Lprior 
##  -11.744013   -2.756816   -8.987197

You can extract (a part of) the sampled parameter values by

getSample(out, start = 100, end = NULL, thin = 5, whichParameters = 1:2)

For all samplers, you can conveniently perform multiple runs via the nrChains argument

iter = 1000
settings = list(iterations = iter, nrChains = 3, message = FALSE)
out <- runMCMC(bayesianSetup = bayesianSetup, sampler = "Metropolis", settings = settings)

## Parameter(s) currentChain   not used in Metropolis
## 
## Parameter(s) currentChain   not used in Metropolis
## 
## Parameter(s) currentChain   not used in Metropolis

The result is an object of mcmcSamplerList, which should allow to do everything one can do with an mcmcSampler object (with slightly different output sometimes).

print(out)

## [1] "mcmcSamplerList - you can use the following methods to summarize, plot or reduce this class:"
## [1] getSample plot      print     summary  
## see '?methods' for accessing help and source code

summary(out)

## # # # # # # # # # # # # # # # # # # # # # # # # # 
## ## MCMC chain summary ## 
## # # # # # # # # # # # # # # # # # # # # # # # # # 
##  
## # MCMC sampler:  Metropolis 
## # Nr. Chains:  3 
## # Iterations per chain:  1000 
## # Rejection rate:  0.688 
## # Effective sample size:  308 
## # Runtime:  1.034  sec. 
##  
## # Parameters
##         psf MAP   2.5% median 97.5%
## par 1 1.011   0 -2.172  0.005 1.876
## par 2 1.023   0 -1.853  0.103 1.936
## par 3 1.013   0 -1.795  0.027 2.180
## 
## ## DIC:  11.187 
## ## Convergence 
##  Gelman Rubin multivariate psrf:  1.024 
##  
## ## Correlations 
##       par 1  par 2  par 3
## par 1 1.000  0.006  0.085
## par 2 0.006  1.000 -0.030
## par 3 0.085 -0.030  1.000

For example, in the plot you now see 3 chains.

plot(out)

There are a few additional functions that may only be available for lists, for example convergence checks

#getSample(out, coda = F)
gelmanDiagnostics(out, plot = T)

## Potential scale reduction factors:
## 
##       Point est. Upper C.I.
## par 1       1.01       1.01
## par 2       1.02       1.07
## par 3       1.01       1.04
## 
## Multivariate psrf
## 
## 1.02

Which sampler to choose?

The BT package provides a large class of different MCMC samplers, and it depends on the particular application which is most suitable.

In the absence of further information, we currently recommend the DEzs sampler. This is also the default in the runMCMC function.

BayesianSetup Reference

Reference on creating likelihoods

The likelihood should be provided as a log density function.

ll = logDensity(x)

See options for parallelization below. We will use a simple 3-d multivariate normal density for this demonstration.

ll <- generateTestDensityMultiNormal(sigma = "no correlation")
bayesianSetup = createBayesianSetup(likelihood = ll, lower = rep(-10, 3), upper = rep(10, 3))

Parallelization of the likelihood evaluations

Likelihoods are often costly to compute. If that is the case for you, you should think about parallelization possibilities. The ‘createBayesianSetup’ function has the input variable ‘parallel’, with the following options

F / FALSE means no parallelization should be used
T / TRUE means that automatic parallelization options from R are used (careful: this will not work if your likelihood writes to file, or uses global variables or functions - see general R help on parallelization)
“external”, assumed that the likelihood is already parallelized. In this case, the function needs to accept a matrix with parameters as columns, and rows as the different model runs you want to evaluate. This is the most likely option to use if you have a complicated setup (file I/O, HPC cluster) that cannot be treated with the standard R parallelization.

Algorithms in the BayesianTools package can make use of parallel computing if this option is specified in the BayesianSetup. Note that currently, parallelization is used by the following algorithms: SMC, DEzs and DREAMzs sampler. It can also be used through the BayesianSetup with the functions of the sensitivity package.

Here some more details on the parallelization

1. In-build parallelization:

The in-build parallelization is the easiest way to make use of parallel computing. In the “parallel” argument you can choose the number of cores used for parallelization. Alternatively for TRUE or “auto” all available cores except for one will be used. Now the proposals are evaluated in parallel. Technically, the in-build parallelization uses an R cluster to evaluate the posterior density function. The input for the parallel function is a matrix, where each column represents a parameter and each row a proposal. In this way, the proposals can be evaluated in parallel. For sampler, where only one proposal is evaluated at a time (namely the Metropolis based algorithms as well as DE/DREAM without the zs extension), no parallelization can be used.

2. External parallelization

The second option is to use an external parallelization. Here, a parallelization is attempted in the user defined likelihood function. To make use of external parallelization, the likelihood function needs to take a matrix of proposals and return a vector of likelihood values. In the proposal matrix each row represents one proposal, each column a parameter. Further, you need to specify the “external” parallelization in the “parallel” argument. In simplified terms the use of external parallelization uses the following steps:

## Definition of likelihood function
likelihood <- function(matrix){
    # Calculate likelihood in parallel
    # Return vector of likelihood valus
}

## Create Bayesian Setup
BS <- createBayesianSetup(likelihood, parallel = "external", ...)

## Run MCMC
runMCMC(BS, sampler = "SMC", ...)

3. Multi-core and cluster calculations

If you want to run your calculations on a cluster there are several ways to achieve it.

In the first case you want to parallize n internal (not overall chains) on n cores. The argument “parallel = T” in “createBayesianSetup” allows only at most parallelization on 3 cores for the SMC, DEzs and DreamsSamplers. But by setting “parallel = n” to n cores in the “createBayesianSetup”, the internal chains of DEzs and DREAMzs will be parallelized on n cores. This works only for the DEzs and DREAMzs samplers.

## n = Number of cores
n=2
x <- c(1:10)
likelihood <- function(param) return(sum(dnorm(x, mean = param, log = T)))
bayesianSetup <- createBayesianSetup(likelihood, parallel = n, lower = -5, upper = 5)

## give runMCMC a matrix with n rows of proposals as startValues or sample n times from the previous created sampler
out <- runMCMC(bayesianSetup, settings = list(iterations = 1000))

In the second case you want to parallize n internal chains on n cores with a external parallilzed likelihood function. Unlike the previous case, that way DEzs, DREAMzs, and SMC samplers can be parallelized.

### Create cluster with n cores
cl <- parallel::makeCluster(n)

## Definition of the likelihood
likelihood  <- function(X) sum(dnorm(c(1:10), mean = X, log = T))

## Definition of the likelihood which will be calculated in parallel. Instead of the parApply function, we could also define a costly parallelized likelihood
pLikelihood <- function(param) parallel::parApply(cl = cl, X = param, MARGIN = 1, FUN = likelihood)

## export functions, dlls, libraries
# parallel::clusterEvalQ(cl, library(BayesianTools))
parallel::clusterExport(cl, varlist = list(likelihood))

## create BayesianSetup
bayesianSetup <- createBayesianSetup(pLikelihood, lower = -10, upper = 10, parallel = 'external')

## For this case we want to parallelize the internal chains, therefore we create a n row matrix with startValues, if you parallelize a model in the likelihood, do not set a n*row Matrix for startValue
settings = list(iterations = 100, nrChains = 1, startValue = bayesianSetup$prior$sampler(n))

## runMCMC
out <- runMCMC(bayesianSetup, settings, sampler = "DEzs")

In a another case your likelihood requires a parallized model. Start your cluster and export your model, the required libraries, and dlls. Now you can start your calculations with the argument “parallel = external” in createBayesianSetup.

### Create cluster with n cores
cl <- parallel::makeCluster(n)

## export your model
# parallel::clusterExport(cl, varlist = list(complexModel))

## Definition of the likelihood
likelihood  <- function(param) {
  # ll <- complexModel(param)
  # return(ll)
} 

## create BayesianSetup and settings
bayesianSetup <- createBayesianSetup(likelihood, lower = -10, upper = 10, parallel = 'external')
settings = list(iterations = 100, nrChains = 1)

## runMCMC
out <- runMCMC(bayesianSetup, settings)

In the last case you can parallize over whole chain calculations. However, here the likelihood itself will not be parallelized. Each chain will be run on one core and the likelihood will be calculated on that core.

### Definition of likelihood function
x <- c(1:10)
likelihood <- function(param) return(sum(dnorm(x, mean = param, log = T)))

## Create BayesianSetup and settings
bayesianSetup <- createBayesianSetup(likelihood, lower = -10, upper = 10, parallel = F)
settings = list(iterations = 100000)

## Start cluster with n cores for n chains and export BayesianTools library
cl <- parallel::makeCluster(n)
parallel::clusterEvalQ(cl, library(BayesianTools))

## calculate parallel n chains, for each chain the likelihood will be calculated on one core
out <- parallel::parLapply(cl, 1:n, fun = function(X, bayesianSetup, settings) runMCMC(bayesianSetup, settings, sampler = "DEzs"), bayesianSetup, settings)

## Combine the chains
out <- createMcmcSamplerList(out)

** Remark: even though parallelization can significantly reduce the computation time, it is not always useful because of the so-called communication overhead (computational time for distributing and retrieving infos from the parallel cores). For models with low computational cost, this procedure can take more time than the actual evaluation of the likelihood. If in doubt, make a small comparison of the runtime before starting your large sampling. **

Reference on creating priors

The prior in the BayesianSetup consists of four parts

A log density function
An (optional) sampling function (must be a function without parameters, that returns a draw from the prior)
lower / upper boundaries
Additional info - best values, names of the parameters, …

These information can be passed by first creating an a extra object, via createPrior, or through the the createBayesianSetup function.

Creating priors

You have 5 options to create a prior

Do not set a prior - in this case, an infinite prior will be created
Set min/max values - a bounded flat prior and the corresponding sampling function will be created
Use one of the pre-definded priors, see ?createPrior for a list. One of the options here is to use a previous MCMC output as new prior. Pre-defined priors will usually come with a sampling function
Use a user-define prior, see ?createPrior
Create a prior from a previous MCMC sample

Creating user-defined priors

If creating a user-defined prior, the following information can/should be provided to createPrior:

A log density function, as a function of a parameter vector x, same syntax as the likelihood
Additionally, you should consider providing a function that samples from the prior, because many samplers (SMC, DE, DREAM) can make use of this function for initial conditions. If you use one of the pre-defined priors, the sampling function is already implemented
lower / upper boundaries (can be set on top of any prior, to create truncation)
Additional info - best values, names of the parameters, …

Creating a prior from a previous MCMC sample

The following example from the help shows how this works

# Create a BayesianSetup
ll <- generateTestDensityMultiNormal(sigma = "no correlation")
bayesianSetup = createBayesianSetup(likelihood = ll, lower = rep(-10, 3), upper = rep(10, 3))

settings = list(iterations = 2500,  message = FALSE)
out <- runMCMC(bayesianSetup = bayesianSetup, settings = settings)


newPrior = createPriorDensity(out, method = "multivariate", eps = 1e-10, lower = rep(-10, 3), upper =  rep(10, 3), best = NULL)
bayesianSetup <- createBayesianSetup(likelihood = ll, prior = newPrior)

settings = list(iterations = 1000,  message = FALSE)
out <- runMCMC(bayesianSetup = bayesianSetup, settings = settings)

MCMC sampler reference

The runMCMC() function

The runMCMC function is the central function for starting MCMC algorithms in the BayesianTools package. It requires a bayesianSetup, a choice of sampler (standard is DEzs), and optionally changes to the standard settings of the chosen sampler.

runMCMC(bayesianSetup, sampler = “DEzs”, settings = NULL)

One optional argument that you can always use is nrChains - the default is 1. If you choose more, the runMCMC will perform several runs.

ll <- generateTestDensityMultiNormal(sigma = "no correlation")

bayesianSetup = createBayesianSetup(likelihood = ll, lower = rep(-10, 3), upper = rep(10, 3))

settings = list(iterations = 10000, nrChains= 3, message = FALSE)
out <- runMCMC(bayesianSetup = bayesianSetup, sampler = "Metropolis", settings = settings)

plot(out)

marginalPlot(out, prior = T)

correlationPlot(out)

gelmanDiagnostics(out, plot=T)

## Potential scale reduction factors:
## 
##       Point est. Upper C.I.
## par 1          1       1.00
## par 2          1       1.00
## par 3          1       1.01
## 
## Multivariate psrf
## 
## 1

# option to restart the sampler

settings = list(iterations = 1000, nrChains= 1, message = FALSE)
out <- runMCMC(bayesianSetup = bayesianSetup, sampler = "Metropolis", settings = settings)

out2 <- runMCMC(bayesianSetup = out)

out3 <- runMCMC(bayesianSetup = out2)

#plot(out)
#plot(out3)

# create new prior from posterior sample 

newPriorFromPosterior <- createPriorDensity(out2)

The different MCMC samplers

For convenience we define a number of iterations

iter = 10000

The Metropolis MCMC class

The BayesianTools package is able to run a large number of Metropolis-Hastings (MH) based algorithms All of these samplers can be accessed by the “Metropolis” sampler in the runMCMC function by specifying the sampler’s settings.

The following code gives an overview about the default settings of the MH sampler.

applySettingsDefault(sampler = "Metropolis")

## $sampler
## [1] "Metropolis"
## 
## $startValue
## NULL
## 
## $iterations
## [1] 10000
## 
## $burnin
## [1] 0
## 
## $thin
## [1] 1
## 
## $consoleUpdates
## [1] 100
## 
## $parallel
## NULL
## 
## $message
## [1] TRUE
## 
## $optimize
## [1] TRUE
## 
## $proposalGenerator
## NULL
## 
## $adapt
## [1] FALSE
## 
## $adaptationInterval
## [1] 500
## 
## $adaptationNotBefore
## [1] 3000
## 
## $DRlevels
## [1] 1
## 
## $proposalScaling
## NULL
## 
## $adaptationDepth
## NULL
## 
## $temperingFunction
## NULL
## 
## $gibbsProbabilities
## NULL
## 
## $nrChains
## [1] 1
## 
## $runtime
## [1] 0
## 
## $sessionInfo
## R version 4.2.1 (2022-06-23)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Big Sur ... 10.16
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] BayesianTools_0.1.8
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.10          highr_0.9            bslib_0.4.0         
##  [4] compiler_4.2.1       nloptr_2.0.3         jquerylib_0.1.4     
##  [7] tools_4.2.1          boot_1.3-28          digest_0.6.31       
## [10] lme4_1.1-30          jsonlite_1.8.0       evaluate_0.15       
## [13] nlme_3.1-157         lattice_0.20-45      rlang_1.0.6         
## [16] Matrix_1.4-1         cli_3.4.1            rstudioapi_0.13     
## [19] yaml_2.3.5           mvtnorm_1.1-3        xfun_0.31           
## [22] fastmap_1.1.0        coda_0.19-4          DHARMa_0.4.6        
## [25] stringr_1.4.0        knitr_1.39           sass_0.4.2          
## [28] IDPmisc_1.1.20       stats4_4.2.1         grid_4.2.1          
## [31] R6_2.5.1             tmvtnorm_1.5         rmarkdown_2.14      
## [34] minqa_1.2.4          magrittr_2.0.3       htmltools_0.5.2     
## [37] MASS_7.3-57          splines_4.2.1        numDeriv_2016.8-1.1 
## [40] Brobdingnag_1.2-7    bridgesampling_1.1-2 sandwich_3.0-2      
## [43] stringi_1.7.6        gmm_1.7              cachem_1.0.6        
## [46] zoo_1.8-10

The following examples show how the different settings can be used. As you will see different options can be activated singly or in combination.

Standard MH MCMC

The following settings will run the standard Metropolis Hastings MCMC.

Refernences: Hastings, W. K. (1970). Monte carlo sampling methods using markov chains and their applications. Biometrika 57 (1), 97-109.

Metropolis, N., A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and E. Teller (1953). Equation of state calculations by fast computing machines. The journal of chemical physics 21 (6), 1087 - 1092.

settings <- list(iterations = iter, adapt = F, DRlevels = 1, gibbsProbabilities = NULL, temperingFunction = NULL, optimize = F, message = FALSE)
out <- runMCMC(bayesianSetup = bayesianSetup, sampler = "Metropolis", settings = settings)
plot(out)

Standard MH MCMC, prior optimization

This sampler uses an optimization step prior to the sampling process. The optimization aims at improving the starting values and the covariance of the proposal distribution.

settings <- list(iterations = iter, adapt = F, DRlevels = 1, gibbsProbabilities = NULL, temperingFunction = NULL, optimize = T, message = FALSE)
out <- runMCMC(bayesianSetup = bayesianSetup, sampler = "Metropolis", settings = settings)
plot(out)

Adaptive MCMC, prior optimization

In the adaptive Metropolis sampler (AM) the information already acquired in the sampling process is used to improve (or adapt) the proposal function. In the BayesianTools package the history of the chain is used to adapt the covariance of the propoasal distribution.

References: Haario, H., E. Saksman, and J. Tamminen (2001). An adaptive metropolis algorithm. Bernoulli , 223-242.

settings <- list(iterations = iter, adapt = T, DRlevels = 1, gibbsProbabilities = NULL, temperingFunction = NULL, optimize = T,  message = FALSE)
out <- runMCMC(bayesianSetup = bayesianSetup, sampler = "Metropolis", settings = settings)
plot(out)

Standard MCMC, prior optimization, delayed rejection

Even though rejection is an essential step of a MCMC algorithm it can also mean that the proposal distribution is (locally) badly tuned to the target distribution. In a delayed rejection (DR) sampler a second (or third, etc.) proposal is made before rejection. This proposal is usually drawn from a different distribution, allowing for a greater flexibility of the sampler. In the BayesianTools package the number of delayed rejection steps as well as the scaling of the proposals can be determined. ** Note that the current version only supports two delayed rejection steps. **

References: Green, Peter J., and Antonietta Mira. “Delayed rejection in reversible jump Metropolis-Hastings.” Biometrika (2001): 1035-1053.

settings <- list(iterations = iter, adapt = F, DRlevels = 2, gibbsProbabilities = NULL, temperingFunction = NULL, optimize = T,  message = FALSE)
out <- runMCMC(bayesianSetup = bayesianSetup, sampler = "Metropolis", settings = settings)
plot(out)

Adaptive MCMC, prior optimization, delayed rejection

The delayed rejection adaptive Metropolis (DRAM) sampler is merely a combination of the two previous sampler (DR and AM).

References: Haario, Heikki, et al. “DRAM: efficient adaptive MCMC.” Statistics and Computing 16.4 (2006): 339-354.

settings <- list(iterations = iter, adapt = T, DRlevels = 2, gibbsProbabilities = NULL, temperingFunction = NULL, optimize = T,  message = FALSE)
out <- runMCMC(bayesianSetup = bayesianSetup, sampler = "Metropolis", settings = settings)
plot(out)

Standard MCMC, prior optimization, Gibbs updating

To reduce the dimensions of the target function a Metropolis-within-Gibbs sampler can be run with the BayesianTools package. This means in each iteration only a subset of the parameter vector is updated. In the example below at most two (of the three) parameters are updated each step, and it is double as likely to vary one than varying two.

** Note that currently adaptive cannot be mixed with Gibbs updating! **

settings <- list(iterations = iter, adapt = T, DRlevels = 1, gibbsProbabilities = c(1,0.5,0), temperingFunction = NULL, optimize = T,  message = FALSE)
out <- runMCMC(bayesianSetup = bayesianSetup, sampler = "Metropolis", settings = settings)
plot(out)

Standard MCMC, prior optimization, gibbs updating, tempering

Simulated tempering is closely related to simulated annealing (e.g. Bélisle, 1992) in optimization algorithms. The idea of tempering is to increase the acceptance rate during burn-in. This should result in a faster initial scanning of the target function. To include this a tempering function needs to be supplied by the user. The function describes how the acceptance rate is influenced during burn-in. In the example below an exponential decline approaching 1 (= no influece on the acceptance rate)is used.

References: Bélisle, C. J. (1992). Convergence theorems for a class of simulated annealing algorithms on rd. Journal of Applied Probability, 885–895.

C. J. Geyer (2011) Importance sampling, simulated tempering, and umbrella sampling, in the Handbook of Markov Chain Monte Carlo, S. P. Brooks, et al (eds), Chapman & Hall/CRC.

temperingFunction <- function(x) 5 * exp(-0.01*x) + 1
settings <- list(iterations = iter, adapt = F, DRlevels = 1, gibbsProbabilities = c(1,1,0), temperingFunction = temperingFunction, optimize = T,  message = FALSE)
out <- runMCMC(bayesianSetup = bayesianSetup, sampler = "Metropolis", settings = settings)
plot(out)

Differential Evolution MCMC

The BT package implements two versions of the differential evolution MCMC. In doubt, you should use the DEzs option.

The first is the normal DE MCMC, corresponding to Ter Braak, Cajo JF. “A Markov Chain Monte Carlo version of the genetic algorithm Differential Evolution: easy Bayesian computing for real parameter spaces.” Statistics and Computing 16.3 (2006): 239-249. In this sampler multiple chains are run in parallel (but not in the sense of parallel computing). The main difference to the Metropolis based algorithms is the creation of the proposal. Generally all samplers use the current positin of the chain and add a step in the parameter space to generate a new proposal. Whereas in the Metropolis based sampler this step is usually drawn from a multivariate normal distribution (yet every distribution is possible), the DE sampler uses the current position of two other chains to generate the step for each chain. For sucessful sampling at least 2*d chains, with d being the number of parameters, need to be run in parallel.

settings <- list(iterations = iter,  message = FALSE)
out <- runMCMC(bayesianSetup = bayesianSetup, sampler = "DE", settings = settings)
plot(out)

The second is the Differential Evolution MCMC with snooker update and sampling from past states, corresponding to ter Braak, Cajo JF, and Jasper A. Vrugt. “Differential evolution Markov chain with snooker updater and fewer chains.” Statistics and Computing 18.4 (2008): 435-446. This extension covers two differences to the normal DE MCMC. First a snooker update is used based on a user defined probability. Second also past states of other chains are respected in the creation of the proposal. These extensions allow for fewer chains (i.e. 3 chains are usually enough for up to 200 parameters) and parallel computing as the current position of each chain is only dependent on the past states of the other chains.

settings <- list(iterations = iter,  message = FALSE)
out <- runMCMC(bayesianSetup = bayesianSetup, sampler = "DEzs", settings = settings)
plot(out)

DREAM sampler

Also for the DREAM sampler, there are two versions included. First of all, the standard DREAM sampler, see Vrugt, Jasper A., et al. “Accelerating Markov chain Monte Carlo simulation by differential evolution with self-adaptive randomized subspace sampling.” International Journal of Nonlinear Sciences and Numerical Simulation 10.3 (2009): 273-290.

This sampler is largely build on the DE sampler with some significant differences: 1) More than two chains can be used to generate a proposal. 2) A randomized subspace sampling can be used to enhance the efficiency for high dimensional posteriors. Each dimension is updated with a crossover probalitity CR. To speed up the exploration of the posterior DREAM adapts the distribution of CR values during burn-in to favor large jumps over small ones. 3) Outlier chains can be removed during burn-in.

settings <- list(iterations = iter,  message = FALSE)
out <- runMCMC(bayesianSetup = bayesianSetup, sampler = "DREAM", settings = settings)
plot(out)

The second implementation uses the same extension as the DEzs sampler. Namely sampling from past states and a snooker update. Also here this extension allows for the use of fewer chains and parallel computing.

Again, in doubt you should prefer “DREAMzs”.

settings <- list(iterations = iter,  message = FALSE)
out <- runMCMC(bayesianSetup = bayesianSetup, sampler = "DREAMzs", settings = settings)
plot(out)

T-walk

The T-walk is a MCMC algorithm developed by Christen, J. Andrés, and Colin Fox. “A general purpose sampling algorithm for continuous distributions (the t-walk).” Bayesian Analysis 5.2 (2010): 263-281. In the sampler two independent points are used to explore the posterior space. Based on probabilities four different moves are used to generate proposals for the two points. As for the DE sampler this procedure requires no tuning of the proposal distribution for efficient sampling in complex posterior distributions.

settings = list(iterations = iter,  message = FALSE) 

out <- runMCMC(bayesianSetup = bayesianSetup, sampler = "Twalk", settings = settings)

Convergence checks for MCMCs

All MCMCs should be checked for convergence. We recommend the standard procedure of Gelmal-Rubin. This procedure requires running several MCMCs (we recommend 3). This can be achieved either directly in the runMCMC (nrChains = 3), or, for runtime reasons, by combining the results of three independent runMCMC evaluations with nrChains = 1.

settings <- list(iterations = iter, nrChains = 3,  message = FALSE)
out <- runMCMC(bayesianSetup = bayesianSetup, sampler = "Metropolis", settings = settings)

## Parameter(s) currentChain   not used in Metropolis
## 
## Parameter(s) currentChain   not used in Metropolis
## 
## Parameter(s) currentChain   not used in Metropolis

plot(out)

#chain = getSample(out, coda = T)
gelmanDiagnostics(out, plot = F)

## Potential scale reduction factors:
## 
##       Point est. Upper C.I.
## par 1          1       1.00
## par 2          1       1.01
## par 3          1       1.00
## 
## Multivariate psrf
## 
## 1

Non-MCMC sampling algorithms

MCMCs sample the posterior space by creating a chain in parameter space. While this allows “learning” from past steps, it does not permit the parallel execution of a large number of posterior values at the same time.

An alternative to MCMCs are particle filters, aka Sequential Monte-Carlo (SMC) algorithms. See Hartig, F.; Calabrese, J. M.; Reineking, B.; Wiegand, T. & Huth, A. Statistical inference for stochastic simulation models - theory and application Ecol. Lett., 2011, 14, 816-827

Rejection samling

The easiest option is to simply sample a large number of parameters and accept them according to their posterior value. This option can be emulated with the implemented SMC, setting iterations to 1.

settings <- list(initialParticles = iter, iterations= 1)
out <- runMCMC(bayesianSetup = bayesianSetup, sampler = "SMC", settings = settings)
plot(out)

Sequential Monte Carlo (SMC)

The more sophisticated option is using the implemented SMC, which is basically a particle filter that applies several filter steps.

settings <- list(initialParticles = iter, iterations= 10)
out <- runMCMC(bayesianSetup = bayesianSetup, sampler = "SMC", settings = settings)
plot(out)

Note that the use of a number for initialParticles requires that the bayesianSetup includes the possibility to sample from the prior.

Bayesian model comparison and averaging

There are a number of Bayesian model selection and model comparison methods. The BT implements three of the most common of them, the DIC, the WAIC, and the Bayes factor.

On the Bayes factor, see Kass, R. E. & Raftery, A. E. Bayes Factors J. Am. Stat. Assoc., Amer Statist Assn, 1995, 90, 773-795
An overview on DIC and WAIC is given in Gelman, A.; Hwang, J. & Vehtari, A. (2014) Understanding predictive information criteria for Bayesian models. Statistics and Computing, 24, 997-1016-. On DIC, see also the original reference by Spiegelhalter, D. J.; Best, N. G.; Carlin, B. P. & van der Linde, A. (2002) Bayesian measures of model complexity and fit. J. Roy. Stat. Soc. B, 64, 583-639.

The Bayes factor relies on the calculation of marginal likelihoods, which is numerically not without problems. The BT package currently implements three methods

The recommended way is the method “Chib” (Chib and Jeliazkov, 2001). which is based on MCMC samples, but performs additional calculations. Despite being the current recommendation, note there are some numeric issues with this algorithm that may limit reliability for larger dimensions.
The harmonic mean approximation, is implemented only for comparison. Note that the method is numerically unrealiable and usually should not be used.
The third method is simply sampling from the prior. While in principle unbiased, it will only converge for a large number of samples, and is therefore numerically inefficient.

Example

Data linear Regression with quadratic and linear effect

sampleSize = 30
x <- (-(sampleSize-1)/2):((sampleSize-1)/2)
y <-  1 * x + 1*x^2 + rnorm(n=sampleSize,mean=0,sd=10)
plot(x,y, main="Test Data")

Likelihoods for both

likelihood1 <- function(param){
    pred = param[1] + param[2]*x + param[3] * x^2
    singlelikelihoods = dnorm(y, mean = pred, sd = 1/(param[4]^2), log = T)
    return(sum(singlelikelihoods))  
}

likelihood2 <- function(param){
    pred = param[1] + param[2]*x 
    singlelikelihoods = dnorm(y, mean = pred, sd = 1/(param[3]^2), log = T)
    return(sum(singlelikelihoods))  
}

Posterior definitions

setUp1 <- createBayesianSetup(likelihood1, lower = c(-5,-5,-5,0.01), upper = c(5,5,5,30))

setUp2 <- createBayesianSetup(likelihood2, lower = c(-5,-5,0.01), upper = c(5,5,30))

MCMC and marginal likelihood calculation

settings = list(iterations = 15000,  message = FALSE)
out1 <- runMCMC(bayesianSetup = setUp1, sampler = "Metropolis", settings = settings)
#tracePlot(out1, start = 5000)
M1 = marginalLikelihood(out1)
M1

settings = list(iterations = 15000,  message = FALSE)
out2 <- runMCMC(bayesianSetup = setUp2, sampler = "Metropolis", settings = settings)
#tracePlot(out2, start = 5000)
M2 = marginalLikelihood(out2)
M2

Model comparison via Bayes factor

Bayes factor (need to reverse the log)

exp(M1$ln.ML - M2$ln.ML)

## [1] 1.726888e+15

BF > 1 means the evidence is in favor of M1. See Kass, R. E. & Raftery, A. E. (1995) Bayes Factors. J. Am. Stat. Assoc., Amer Statist Assn, 90, 773-795.

Assuming equal prior weights on all models, we can calculate the posterior weight of M1 as

exp(M1$ln.ML) / ( exp(M1$ln.ML) + exp(M2$ln.ML))

## [1] 1

If models have different model priors, multiply with the prior probabilities of each model.

Model comparison via DIC

The Deviance information criterion is a commonly applied method to summarize the fit of an MCMC chain. It can be obtained via

DIC(out1)$DIC

## [1] 292.0454

DIC(out2)$DIC

## [1] 365.3305

Model comparison via WAIC

The Watanabe–Akaike information criterion is another criterion for model comparison. To be able to calculate the WAIC, the model must implement a log-likelihood that density that allows to calculate the log-likelihood point-wise (the likelihood functions requires a “sum” argument that determines whether the summed log-likelihood should be returned). It can be obtained via

# This will not work, since likelihood1 has no sum argument
# WAIC(out1)

# likelihood with sum argument
likelihood3 <- function(param, sum = TRUE){
    pred <- param[1] + param[2]*x + param[3] * x^2
    singlelikelihoods <- dnorm(y, mean = pred, sd = 1/(param[4]^2), log = T)
    return(if (sum == TRUE) sum(singlelikelihoods) else singlelikelihoods)  
}
setUp3 <- createBayesianSetup(likelihood3, lower = c(-5,-5,-5,0.01), upper = c(5,5,5,30))
out3 <- runMCMC(bayesianSetup = setUp3, sampler = "Metropolis", settings = settings)

WAIC(out3)

## $WAIC1
## [1] 230.808
## 
## $WAIC2
## [1] 231.8506
## 
## $lppd
## [1] -112.2759
## 
## $pWAIC1
## [1] 3.128133
## 
## $pWAIC2
## [1] 3.649388

Bayesian Tools - General-Purpose MCMC and SMC Samplers and Tools for Bayesian Statistics