How to use the quadratic response model

In this example we demonstrate how to implement the quadratic response model in . Quadratic curves are frequently occurring in community ecology, specifically to describe the response of species to the environment. When one has measured predictor variables, a quadratic function can straightforwardly be included in a regression in R using the \(poly(\cdot,2)\) function. However, in a GLLVM, latent variables are included that can represent unmeasured predictors. As such, we might want to test if species respond to those unknown predictors too. This is similar to the theory behind other ordination methods, such as Correspondence Analysis and its constrained variant (CCA).

We will use the hunting spider dataset as an example, which includes 12 species at 100 sites. It also includes measurements of the environment at 28 sites, though we will not use those here.

library(gllvm)
data("spider", package = "mvabund")
Y <- spider$abund

The unique thing about the quadratic response model, is that specifying a quadratic term for each species separately, coincides with the assumption that species have their own ecological tolerances. A more simple more, would be to assume that species have the same tolerance, in essence that all species are a generalist or specialist to the same degree. This can be done using a linear response model, with random row-effects:

ftEqTol <- gllvm(Y, family = "poisson", row.eff = "random", num.lv = 2)

Next, we can fit a model where we assume species tolerances are the same for all species, but unique per latent variable, which we will refer to as species common tolerances. We do this using the quadratic flag in the \(\text{gllvm}(.)\) function, which has the options FALSE, LV (common tolerances), and TRUE (unique tolerances for all species).

ftComTol <- gllvm(Y, family = "poisson", num.lv = 2, quadratic = "LV")

ftUneqTol <- gllvm(Y, family = "poisson", num.lv = 2, quadratic = TRUE)

GLLVMs are sensitive to the starting values, and with a quadratic response model even more so. As such, the unequal tolerances model by defaults fits a common tolerances model first, to use as starting values. This option is control through the start.struc argument in start.control.

Now, we can use information criteria to determine which of the models fits the hunting spider data best.

AICc(ftEqTol,ftComTol,ftUneqTol)

The unequal tolerances model fits best, as measured by AICc. Species optima and tolerances, and their approximate standard errors, can be extracted:

#Species optima for LVs
optima(ftUneqTol)

#Species tolerances
tolerances(ftUneqTol)

The standard deviation of the latent variables can be printed using the \(\text{summary}(.)\) function. Since latent variable models are scale invariant, this scale parameter is relative to the identifiability constraint (diagonal of the species scores matrix). It can be understood as a measure of gradient length, though for a measure that might be more comparable to DCA (i.e. on average unit variance species curves), see the reference below.

The residual variation explained can be used to calculate residual correlations, or to partition variation, similar as in the vanilla GLLVM:

#Residual variance per latent variable
#for the linear term
getResidualCov(ftUneqTol)$var.q

#for the quadratic term
getResidualCov(ftUneqTol)$var.q2

Finally, we can use the \(\text{ordiplot}(.)\) function to visualize the species optima. However, since species optima can be quite large if they are unobserved, or if too little information is present in the data, creating a nice figure can be challenging. One attempt to improve readability of the species optima in a figure is to point an arrow in their general direction, if species optima are “unobserved”: outside of the range of the predicted site scores.

ordiplot(ftUneqTol, biplot=TRUE, spp.arrows = TRUE)

The standard deviation of the latent variables, presented in the summary information of the model, can serve as a measure of gradient length. This measure is different to that presented in van der Veen et al. (2021), and not directly comparable to e.g. the output of axis length by Detrended Correspondence Analysis (DCA).

Alternatively, we can also calculate it following the definition of van der Veen et al (2021), with a minor correction. This also allows us to calculate the rate of turnover along the gradient. It is most straightforward to do this with the “common tolerances” model, although we can do it with the unequal tolerances model. But, then we need to decide how to standardise the species curves (e.g., so that the mean or median tolerance is one).

# Extract tolerances
tol <- tolerances(ftComTol, sd.errors = FALSE)
gradLength <- 4/tol[1,]
turn <- 2*qnorm(.999, sd = tol[1,])

cat("Gradient length:", gradLength)

This measure of gradient length tells us that the second LVs represents a short gradient. That is also visible from the figure below, as most species responses are estimated to be (near) linear. To visually inspect the rate of turnover we will plot the species curves.

cat("Turnover rate:", turn)

The turnover rate is naturally quicker on the longer gradient, so shorter for the second latent variable than the first. This means that we expect the community to change faster along the first latent variable; about every 3.26 units of the latent variable we encounter a completely different community, while on the second latent variable this takes much more, about 14 units. That corresponds with having a short gradient; if species response are nearly linear they virtually occur everywhere along the gradient, in contrast to when responses are unimodal.

par(mfrow=c(2,1))
LVs = getLV(ftComTol)
newLV = cbind(LV1 = seq(min(LVs[,1]), max(LVs[,1]), length.out=1000), LV2 = 0)
preds <- predict(ftComTol, type = "response", newLV = newLV)
plot(NA, ylim = range(preds), xlim = c(range(getLV(ftComTol))), ylab  = "Predicted response", xlab = "LV1")
segments(x0=optima(ftComTol, sd.errors = FALSE)[,1],x1 = optima(ftComTol, sd.errors = FALSE)[,1], y0 = rep(0, ncol(ftComTol$y)), y1 = apply(preds,2,max), col = "red", lty = "dashed", lwd = 2)
rug(getLV(ftComTol)[,1])
sapply(1:ncol(ftComTol$y), function(j)lines(sort(newLV[,1]), preds[order(newLV[,1]),j], lwd = 2))

LVs = getLV(ftComTol)
newLV = cbind(LV1 = 0, LV2 =  seq(min(LVs[,2]), max(LVs[,2]), length.out=1000))
preds <- predict(ftComTol, type = "response", newLV = newLV)
plot(NA, ylim = range(preds), xlim = c(range(getLV(ftComTol))), ylab  = "Predicted response", xlab = "LV2")
segments(x0=optima(ftComTol, sd.errors = FALSE)[,2],x1 = optima(ftComTol, sd.errors = FALSE)[,2], y0 = rep(0, ncol(ftComTol$y)), y1 = apply(preds,2,max), col = "red", lty = "dashed", lwd = 2)
rug(getLV(ftComTol)[,2])
sapply(1:ncol(ftComTol$y), function(j)lines(sort(newLV[,2]), preds[order(newLV[,2]),j], lwd = 2))

How to use the quadratic response model

Bert van der Veen

2024-11-26

References