gptoolsStan
is a minimal package to publish Stan code
for efficient Gaussian process inference. The package can be used with
the cmdstanr
interface for Stan in R. Unfortunately, Rstan
is not
supported because it does
not provide an option to specify include paths. If you’re already
familiar with cmdstanr
, dive in below. If not, have a look
at the getting
started guide for the cmdstanr
interface.
This vignette demonstrates the package by sampling from a simple Gaussian process using Fourier methods (see the accompanying publication “Scalable Gaussian Process Inference with Stan” for background on the approach). Here is the model definition in Stan.
functions {
// Include utility functions, such as real fast Fourier transforms.
#include gptools/util.stan
// Include functions to evaluate GP likelihoods with Fourier methods.
#include gptools/fft.stan
}
data {
// The number of sample points.
int<lower=1> n;
// Real fast Fourier transform of the covariance kernel.
vector[n %/% 2 + 1] cov_rfft;
}
parameters {
// GP value at the `n` sampling points.
vector[n] f;
}
model {
// Sampling statement to indicate that `f` is a GP.
f ~ gp_rfft(zeros_vector(n), cov_rfft);
}
Here, we assume that cmdstanr
is installed and that the
cmdstan
compiler is available. See here if you need
help getting set up with cmdstanr
. Let’s compile the
model.
library(cmdstanr)
#> Warning: package 'cmdstanr' was built under R version 4.3.3
#> This is cmdstanr version 0.8.1
#> - CmdStanR documentation and vignettes: mc-stan.org/cmdstanr
#> - CmdStan path: /Users/till/.cmdstan/cmdstan-2.36.0
#> - CmdStan version: 2.36.0
library(gptoolsStan)
model <- cmdstan_model(
stan_file="getting_started.stan",
include_paths=gptools_include_path(),
)
As indicated in the data
section of the Stan program, we
need to define the number of elements n
of the Gaussian
process and the real
fast Fourier transform (RFFT) of the covariance kernel
cov_rfft
. We’ll use 100 elements and a squared
exponential covariance kernel which allows us to evaluate the RFFT
directly.
n <- 100
length_scale <- n / 10
freq <- 1:(n %/% 2 + 1) - 1
# See appendix B of https://arxiv.org/abs/2301.08836 for details on the expression.
cov_rfft <- exp(- 2 * (pi * freq * length_scale / n) ^ 2) + 1e-9
Let’s obtain prior realization by sampling from the model.
fit <- model$sample(
data=list(n=n, cov_rfft=cov_rfft),
seed=123,
chains=1,
iter_warmup=1000,
iter_sampling=5,
)
#> Running MCMC with 1 chain...
#>
#> Chain 1 Iteration: 1 / 1005 [ 0%] (Warmup)
#> Chain 1 Iteration: 100 / 1005 [ 9%] (Warmup)
#> Chain 1 Iteration: 200 / 1005 [ 19%] (Warmup)
#> Chain 1 Iteration: 300 / 1005 [ 29%] (Warmup)
#> Chain 1 Iteration: 400 / 1005 [ 39%] (Warmup)
#> Chain 1 Iteration: 500 / 1005 [ 49%] (Warmup)
#> Chain 1 Iteration: 600 / 1005 [ 59%] (Warmup)
#> Chain 1 Iteration: 700 / 1005 [ 69%] (Warmup)
#> Chain 1 Iteration: 800 / 1005 [ 79%] (Warmup)
#> Chain 1 Iteration: 900 / 1005 [ 89%] (Warmup)
#> Chain 1 Iteration: 1000 / 1005 [ 99%] (Warmup)
#> Chain 1 Iteration: 1001 / 1005 [ 99%] (Sampling)
#> Chain 1 Iteration: 1005 / 1005 [100%] (Sampling)
#> Chain 1 finished in 11.1 seconds.
#> Warning: 5 of 5 (100.0%) transitions hit the maximum treedepth limit of 10.
#> See https://mc-stan.org/misc/warnings for details.
We extract the draws from the fit
object and plot a
realization of the process.
f <- fit$draws("f", format="draws_matrix")
plot(1:n, f[1,], type="l", xlab="covariate x", ylab="Gaussian process f(x)")
This vignette illustrates the use of gptools with an elementary
example. Further details can be found in the more comprehensive
documentation although using the cmdstanpy
interface.