object	`mode`	components
`formulaFec`	`formula`	fecundity model
`formulaRep`	`formula`	maturation model
`specNames`	`character vector`	names in column `species` of `treeData` to include in model
`seedNames`	`character vector`	names of seed types to include in model: match column names in `seedData`
`treeData`	`data.frame`	tree-year by variables, includes columns `plot`, `tree`, `year`, `diam`, `repr`
`seedData`	`data.frame`	trap-year by seed types, includes columns `plot`, `trap`, `year`, one for each `specNames`
`xytree`	`data.frame`	tree by location, includes columns `plot`, `tree`, `x`, `y`
`xytrap`	`data.frame`	trap by location, includes columns `plot`, `trap`, `x`, `y`

simulated data

Data simulation is recommended as the place to start. A simulator provides insight on how well parameters from be identified from my data. The inputs to the simulator can specify the following:

Table 3. Simulation inputs to mastSim

`mastSim` input	explanation
`nplot`	number of plots
`nyr`	mean no. years per plot
`ntree`	mean no. trees per plot
`ntrap`	mean no. traps per plot
`specNames`	tree species codes used in `treeData` column
`seedNames`	seed type codes: a column name in `seedData`

library( mastif )

Most of these inputs are stochasiticized to vary structure of each data set. Here are inputs to the simulator for a species I’ll call acerRubr:

seedNames  <- specNames  <- 'acerRubr'
sim <- list( nplot=5, nyr=10, ntree=30,  ntrap=40, specNames = specNames, 
             seedNames = seedNames )

The list sim holds objects needed for simulation. Here is the simulation:

inputs     <- mastSim( sim )        # simulate dispersal data
seedData   <- inputs$seedData     # year, plot, trap, seed counts
treeData   <- inputs$treeData     # year, plot, tree data
xytree     <- inputs$xytree       # tree locations
xytrap     <- inputs$xytrap       # trap locations
formulaFec <- inputs$formulaFec   # fecundity model
formulaRep <- inputs$formulaRep   # maturation model
trueValues <- inputs$trueValues   # true states and parameter values

I first summarize these model inputs generated by mastSim.

`mastSim`

mastSim generates inputs needed for model fitting with mast. These objects are simulated data, including the four data.frames ( treeData, seedData, xytree, xytrap ) and formulas ( formulaFec, formulaRep ). Other objects are “true” parameter values and latent states in the list truevalues used to simulate the data ( fec, repr, betaFec, betaRep, upar, R ). I want to see if mast can recover these parameter values for the type of data I simulated.

Here is a mapping of objects created by mastSim to the model in Clark et al. ( 2019 ):

Table 4. Some of the objects from the list mastSim.

`mastSim` object	variable	explanation
`trueValues$fec`	$\psi_{ij, t}$	conditional fecundity
`trueValues$repr`	$\rho_{ij, t}$	true maturation status
`treeData$repr`	$z_{ij, t}$	observed maturation status ( with `NA` )
`trueValues$betaFec`	$\beta^x$	coefficients for fecundity
`trueValues$betaRep`	$\beta^v$	coefficients for maturation
`trueValues$R`	$\mathbf{m}$	`specNames` to `seedNames matrix`, rows = $\mathbf{m}_h$
`seedData$active`	in $A_{sj, t}$	fraction of time trap is active
`seedData$area`	in $A_{sj, t}$	trap area

mastSim assumes that predictors of maturation and fecundity are limited to intercept and diameter. Thus, the formulaFec and formulaRep are identical. Here is fecundity:

formulaFec

Before fitting the model I say a bit more about input data.

`treeData` and `xytree`

The information on individual trees is held in the tree-years by variables data.frame treeData. Here are a few lines of treeData generated by mastSim,

head( treeData )

treeData has a row for each tree-year. Several of these columns are required:

data.frame treeData columns

`treeData` column	explanation
`plot`	plot name
`tree`	identifier for each tree, unique within a plot
`year`	observation year
`species`	species name
`diam`	diameter is a predictor for fecundity in $\mathbf{X}$ and maturation in $\mathbf{V}$
`repr`	immature ( `0` ), mature ( `1` ), or unknown ( `NA` )

There can be additional columns, but they are not required for model fitting. The variable repr is reproduction status, often NA.

There is a corresponding data.frame xytree that holds tree locations.

Table 5. data.frame xytree columns

`xytree` column	explanation
`plot`	plot name
`tree`	identifier for each tree, unique within a plot
`x, y`	locations in the sample plot

xytree has fewer rows than treeData, because it is not repeated each year–it assumes that tree locations are fixed. They are cross-referenced by plot and tree:

head( xytree, 5 )

##seedData and xytrap

Seed counts are held in the data.frame seedData as trap-years by seed types. Here are a few lines of seedData,

head( seedData )

Required columns are in Table 6.

Table 6. data.frame seedData columns

`seedData` column	explanation
`plot`	plot name
`trap`	identifier for each trap, unique within a plot
`year`	seed-crop year
`active`	fraction of collecting period that the trap was active
`area`	collection area of trap ( m$^2$ )
`acerRubr`, …	`seedNames` for columns with seed counts

Seed trap location data are held in the data.frame xytrap.

Table 7. data.frame xytrap columns

`xytrap` column	explanation
`plot`	plot name
`trap`	identifier for each trap, unique within a plot
`x, y`	map location

There is no year in xytrap, because locations are assumed to be fixed. If a seed trap location is moved during a study, simply assign it a new trap name. Then seed counts for years when it is not present are assigned NA ( missing data ) will be imputed.

data summary and maps

Because they are stochastic, not all simulations from mastSim generate data with sufficient pattern to allow model fitting. I can look at the relationship between tree diameters and seeds on a map for plot mapPlot and year mapYear:

dataTab <- table( treeData$plot, treeData$year )

w <- which( dataTab > 0, arr.ind=T ) # a plot-year with observations
w <- w[sample( nrow( w ), 1 ), ]

mapYears <- as.numeric( colnames( dataTab )[w[2]] )
mapPlot  <- rownames( dataTab )[w[1]]
inputs$mapPlot    <- mapPlot
inputs$mapYears   <- mapYears
inputs$treeSymbol <- treeData$diam
inputs$SCALEBAR   <- T

mastMap( inputs )

Depending on the numbers and sizes of maps I adust treeScale and trapScale to see how seed accumulation compares with tree size or fecundity. In the above map, trees are shown as green circles and traps as gray squares. The size of the circle comes from the input variable treeSymbol, which is set to tree diameter. I could instead set it to the ‘true’ number of seeds produced by each tree, an output from mastSim.

inputs$treeSymbol <- trueValues$fec
inputs$treeScale  <- 1.5
inputs$trapScale  <- 1
mastMap( inputs )

Which are reproductive (mature)? Here are true values, showing just trees that are mature:

inputs$treeSymbol <- trueValues$repr
inputs$treeScale  <- .5
mastMap( inputs )

To fit the data, there must be sufficient seed traps, and sources must not be so dense such that overlapping seed shadows make the individual contributions undetectable.

Here is the frequency distribution of seeds ( observed ) and of fecundities ( unknown ) from the simulated data:

par( mfrow=c( 1, 2 ), bty='n', mar=c( 4, 4, 1, 1 ) )
seedData  <- inputs$seedData
seedNames <- inputs$seedNames

hist( as.matrix( seedData[, seedNames] ) , nclass=100, 
      xlab = 'seed count', ylab='per trap', main='' )
hist( trueValues$fec, nclass=100, xlab = 'seeds produced', ylab = 'per tree', main = '' )

In data of this type, most seed counts and most tree fecundities are zero.

model fitting

Model fitting requires specification of the number of MCMC iterations ng and the burnin. Here is an analysis using the simulated inputs from mastSim with a small number of iterations:

output   <- mastif( inputs = inputs, ng = 4000, burnin = 500 )

The fitted object output contains MCMC chains, estimates, and predictions, summarized in the next section.

`output` summary, lists, and plots

Sample size, parameter estimates, goodness of fit are all provides as tables by summary.

summary( output )

By default, this summary is sent to the console. It can also be saved to a list, e.g., outputSummary <- summary( output ).

estimates, and predictions in `output`

The main objects returned by mastif include several lists summarized in Table 9. parameters are estimated as part of model fitting. predictions are generated by the fitted model, as predictive distributions. Note that the latent states for log fecundity $\psi_{ij, t}$ and maturation state $\rho_{ij, t}$ are both estimated and predicted.

Table 9. The list created by function mast.

`list` in `output`	summary	contents
`inputs`	from `inputs` with additions	includes `distall` ( trap by tree distance )
`chains`	MCMC chains	`agibbs` ( if random effects, the covariance matrix $\mathbf{A}$ ), `bfec` ( $\boldsymbol{\beta}^x$ ), `brep` ( $\boldsymbol{\beta}^v$ ), `bygibbs` ( $\alpha_l$ or $\gamma_t$ if `yearEffect` included ), `rgibbs` ( $\mathbf{M}$ if multiple seed types ), `sgibbs` ( $\sigma^2$, RMSPE, deviance ), `ugibbs` ( $u$ parameter )
`data`	data attributes	intermediate objects used in fitting, prediction
`fit`	diagnostics	DIC, RMSPE, scoreStates, predictScore
`parameters`	posterior summaries	`alphaMu` and `alphaSe` ( mean and se for $\mathbf{A}$, if included ), `aMu` and `aSe` ( $\mathbf{\beta^w}_{ij}$ ), `betaFec` ( $\boldsymbol{\beta}^x$ ), `betaRep` ( $\boldsymbol{\beta}^v$ ), `betaYrMu` and `betaYrSe` ( mean and se for year or lag coefficients ), `upars` and `dpars` ( dispersal parameters $u$ and $d$ ), `rMu` and `rSe` ( mean and se for $\mathbf{M}$ ), `sigma` ( $\sigma^2$ ), `acfMat` ( group by lag empirical correlation or ACF ), `pacfMat` ( group by lag partial correlation or PACF ), `pacfSe` ( se for `pacfMat` ), `pacsMat` ( group by lag PACF for seed data )
`prediction`	predictive distributions	`fecPred` ( maturation $\rho_{ij, t}$ and fecundity $\psi_{ij, t}$ estimates and predictions ), `seedPred` ( seed counts per trap and predictions per m$^2$ ), `seedPredGrid` predictions for seeds on the space-time prediction grid, `treePredGrid` predictions for maturation and fecundity corresponding to `seedPredGrid`.

Prediction scores for seed-trap observations are provided in output$fit based on the estimated fecundities $[\mathbf{y} | \phi, \rho]$ as scoreStates and on the estimated parameters $[\mathbf{y} | \phi, \rho][\phi, \rho| \boldsymbol{\beta}^x, \boldsymbol{\beta}^w, \dots]$ as predictScore. The former will be substantially higher than the latter in cases where estimates of states $\phi, \rho$ can be found that predict seed data, but the variables in $\mathbf{X}, \mathbf{V}$ do not predict those states well. These proper scoring rules are bases on the log likelihood for the Poisson distribution.

Predictions for a location-year prediction grid are invoked when predList is specified in the call to mastif.

plots of `output`

Here are plots of output, with the list plotPars passing the trueValues for these simulated data:

plotPars <- list( trueValues = trueValues )
mastPlot( output, plotPars )

Because this is a simulated data set, I pass the trueValues in the list plotPars.

By inspecting chains I decide that it is not converged and continue, with output from the previous fit being the new inputs:

output   <- mastif( inputs = output, ng = 5000, burnin = 3000 )

Other arguments passed to mastPlot in the list plotPars are given as examples below and listed at help( mastPlot ).

mastPlot generates the following plots:

maturation: chains for maturation parameters in $\beta^v$.
fecundity: chains for fecundity parameters in $\beta^x$.
dispersal parameter: chains for dispersal parameter $u$ showing prior distribution.
variance sigma: chains for error variance $\sigma^2$, RMSPE, and deviance.
maturation, fecundity: posterior 68% ( boxes ) and 95% ( whiskers ) for $\beta^x$ and $\beta^v$.
maturation, fecundity by diameter: estimates ( dots ) with $95%$ predictive means for latent states.
seed shadow: with 90% predictive interval
prediction: seed data predicted from estimates of latent maturation and fecundity ( a ) and from parameter estimates. If trueValues are supplied from mastSim, then panel ( c ) includes true versus predicted values. There is an important distinction between ( a ) and ( b )–good predictions in ( a ) indicate that combinations of seed sources can be found to predict the seed data, without necessarily meaning that the process model can predict maturation and fecundity. Good predictions in ( b ) face the steeper challenge that the process model must predict both maturation and fecundity, which, in turn, predict seed rain.
parameter recovery: if trueValues are supplied from mastSim, then this plot is provided comparing true and estimated values for $\beta^v$ and $\beta^x$.
predicted fecundity, seed data: maps show predicted fecundity of trees ( sizes of green circles ) with seed data ( gray squares ). If predList is supplied to mastif, then the predicted seed surface is shown as shaded contours.
partial ACF: autocorrelation function for fecundity ( estimated ) ( a ) and in seed counts ( observed ) ( b ).
tree correlation in time: pairwise correlations between trees on the same plot.

The plots displayed by mastPlot include the MCMC chains that are not yet converged. Again, I can restart where I left off by using output as the inputs to mast. In addition, I predict the seed surface from the fitted model for a plot and year, as specified in predList.

output$predList <- list( mapMeters = 3, plots = mapPlot, years = mapYears )
output   <- mastif( inputs = output, ng = 2000, burnin = 1000 )

To look closer at the predicted plot-year I generate a new map:

output$mapPlot    <- mapPlot
output$mapYears   <- mapYears
output$treeScale  <- 1.2
output$trapScale  <- .7
output$PREDICT    <- T
output$scaleValue <- 10
output$plotScale  <- 1
output$COLORSCALE <- T
output$LEGEND     <- T
  
mastMap( output )

The maps show seed counts ( squares ) and fecundity predictions ( circles ). For predicted plot years there is also shown a seed prediction surface. The surface is seeds per m$^2$.

Depending on the simulation, convergence may require more iterations. The partial autocorrelation for years should be weak, because none are simulated in mastSim. However, actual data will contain autocorrelation. The fecundity-time correlations show modal values near zero, because simulated data do not include individual covariance. Positive spatial covariance is imposed by dispersal.

To send output to a single Rmarkdown file and html or pdf, I use this:

plotPars$RMD <- 'pdf'
mastPlot( output, plotPars )

This option will generate a file mastifOutput.Rmd, which can be opened in Rstudio, edited, and knitted to pdf format. It contains data and posterior summaries generated by summary and mastPlot. This pdf will not include stand maps. Maps will be included in the html version, obtained by setting plotPars$RMD <- 'html'.

slow convergence?

Seed shadow models confront convoluted likelihood surfaces, in the sense that we expect local optima. These surfaces are hard to traverse with MCMC ( and impossible with HMC ), because maturation status is a binary state that must be proposed and accepted together with latent fecundity. Posterior simulation can get bogged down when fecundity estimates converge for a combination of mature and immature trees that is locally but not globally optimal. Especially when there are more trees of a species than there are seed traps, we expect many iterations before the algorithm can ‘find’ the specific combination of trees that together best describe the specific combination of seed counts in many seed traps. Compounding the challenge is the fact that, because both maturation and fecundity are latent variables for each individual, the redistribution kernel must be constructed anew at each MCMC step. Yet, analysis of simulated data shows that convergence to the correct posterior distribution is common. It just may depend on finding reasonable prior parameter values ( see below ) and long chains.

Examples in this vignette assign enough interations to show this progress toward convergence may be happening, but sufficiently few to avoid long waiting times. To evaluate convergence, consider the plots for chains of sigma ( the residual variance $\sigma$ on log fecundity ), the rspse ( the seed count residual ), and the deviance. Finally, the plot of seed observed vs predicted gives a sense of progress.

As demonstrated above, restart mastif with the fitted object.

multiple seed types per species

Often a seed type could have come from trees of more than one species. Seeds that are only identified to genus level include in seedNames the character string UNKN. In this simulation the three species pinuTaeda, pinuEchi, and pinuVirg contribute most seeds to the type pinuUNKN. Important: there can be only one element in seedNames containing the string UNKN.

Here are some inputs for the simulation:

specNames <- c( 'pinuTaeda', 'pinuEchi', 'pinuVirg' )
seedNames <- c( 'pinuTaeda', 'pinuEchi', 'pinuVirg', 'pinuUNKN' )
sim    <- list( nyr=4, ntree=25, nplot=10, ntrap=50, specNames = specNames, 
                  seedNames = seedNames )

There is a specNames-by-seedNames matrix M that is estimated as part of the posterior distribution. For this example seedNames containing the string UNKN refers to a seed type that cannot be differentiated beyond the level of the genus pinu.

Here is the simulation with output objects with 2/3 of all seeds identified only to genus level:

inputs <- mastSim( sim )        # simulate dispersal data
R      <- inputs$trueValues$R   # species to seedNames probability matrix
round( R, 2 )

The matrix M stacks the length-$M$ vectors $\mathbf{m}'_s$ discussed in Clark et al. ( 2019 ) as a species-by-seed type matrix. There is a matrix for each plot, here stacked as a single matrix. This is the matrix of values used in simulation. mastif will estimate this matrix as part of the posterior distribution.

Here is a model fit:

output <- mastif( inputs = inputs, ng = 2000, burnin = 1000 )

The model summary now includes estimates for the unknown M as the “species to seed type matrix”:

summary( output )

Output plots will include chains for estimates in M. There will also be estimates for each species included in specNames:

plotPars <- list( trueValues = inputs$trueValues, RMAT = TRUE ) 
mastPlot( output, plotPars )

The inverse of $\mathbf{M}$ is the probability that an unknown seed of type $m$ was produced by a tree of species $s$. These estimates are included in the plot Species to undiff seed.

Again, the .Rmd file can be knitted to a pdf file.

Here is a restart, now with plots and years specified for prediction in predList:

tab   <- with( inputs$seedData, table( plot, year ) )
mapPlot <- 'p1'
mapYears <- as.numeric( colnames( tab )[tab[1, ] > 0] )   # years for 1st plot
output$predList <- list( plots = mapPlot, years = mapYears ) 
output <- mastif( inputs = output, ng = 3000, burnin = 1500 )
mastPlot( output, plotPars = list( trueValues = inputs$trueValues, MAPS = TRUE )  )

Chains for the matrix M will converge for plots where there are sufficient trees and seeds to obtain an estimate. They will not be identified on plots where one or the other are rare or absent.

To map just the seed rain prediction maps, do this:

output$PREDICT  <- T
output$LEGEND   <- T
output$mapPlot  <- mapPlot
output$mapYears <- mapYears
mastMap( output )

`specNames` and `seedNames`

In the many data sets where seeds of multiple species are not differentiated the undifferentiated seed type occupies a column in seedData having a column name that includes the string UNKN. For a concrete example, if specNames on a plot include trees in treeData$species with the specNames pinuTaed and pinuEchi, then many seeds might be indentified as seedNames pinuUNKN. In the foregoing example, estimates of the matrix M reflect the contributions of each species to the UNKN type. This could be specified in several ways:

specNames <- c( 'pinuTaed', 'pinuEchi' )

#seeds not differentiated:
seedNames <- c( 'pinuUNKN' ) 

#one species sometimes differentiated:
seedNames <- c( 'pinuTaed', 'pinuUNKN' )    

#both species sometimes differentiated:
seedNames <- c( 'pinuTaed', 'pinuEchi', 'pinuUNKN' )

`mastPlot` written to files

The function mastPlot generates summary plots of output. The user has access to all objects used to generate these plots, as discussed in the previous section. By default plots in mastPlot can be rendered to the console. If plotPars$SAVEPLOTS = T is included in the list plotPars passed to mastPlot, then each plot is saved to a .pdf file. Plots can be consolidated into a single .Rmd file, which can be knitted to .html or .pdf with plotPars$RMD = html or plotPars$RMD = pdf.

my data

For illustration I use sample data analyzed by Clark et al. ( 2013 ), with data collection that has continued through 2017. It consists of multiple years and sites.

Here I load data for species with a single recognized seed type, Liriodendron tulipifera. Here is a map, with seed traps scaled by seed counts, and trees scaled by diameter,

library( repmis )
d <- "https://github.com/jimclarkatduke/mast/blob/master/liriodendronExample.rData?raw=True"
repmis::source_data( d )
mapList <- list( treeData = treeData, seedData = seedData, 
                 specNames = specNames, seedNames = seedNames, 
                 xytree = xytree, xytrap = xytrap, mapPlot = 'DUKE_BW', 
                 mapYears = 2011:2014, treeSymbol = treeData$diam, 
                 treeScale = .7, trapScale = 1.5, plotScale = 2, 
                 SCALEBAR=T, scaleValue=50 )
mastMap( mapList )

Here are a few lines of treeData, which has been trimmed down to this single species,

head( treeData, 3 )

Here are a few lines of seedData,

head( seedData, 3 )

diameter effect

Here I fit the model.

formulaFec <- as.formula( ~ diam )    # fecundity model
formulaRep <- as.formula( ~ diam )    # maturation model
inputs   <- list( specNames = specNames, seedNames = seedNames, 
                 treeData = treeData, seedData = seedData, xytree = xytree, 
                 xytrap = xytrap )
output <- mastif( inputs, formulaFec, formulaRep, ng = 3000, burnin = 1000 )

Following this short sequence, I fit a longer one and predict one of the plots:

output$predList <- list( mapMeters = 10, plots = 'DUKE_EW', years = 2010:2015 ) 
output <- mastif( inputs = output, ng = 4000, burnin = 1000 )

Here are some plots followed by comments on display panels. Notice that when it progresses to the maps for plot DUKE_HW, they will include the predicted seed shadows:

mastPlot( output )

From fecundity chains, still more iterations are needed for convergence.

From seed shadow, seed rain beneath a 30-cm diameter tree averages near 0.2 seeds per m$^2$.

Although a combination of maturation/fecundity can be found to predict the seed data in prediction ( part a ), the process model does not well predict the maturation/fecundity combination ( part b ). In other words, the maturation/fecundity/dispersal aspect of the model is effective, whereas the design does not yet include variables that help explain maturation/fecundity.

From the many maps in predicted fecundity, seed data, note that the DUKE_EW site includes prediction surfaces, as specfied in predList.

Here is the summary:

summary( output )

At this point in the the Gibbs sampler, the DIC and root mean square prediction error are:

output$fit

As mentioned above, prediction scores for seed-trap observations are based on the estimated fecundities $[\mathbf{y} | \phi, \rho]$ as scoreStates and on the estimated parameters $[\mathbf{y} | \phi, \rho][\phi, \rho| \boldsymbol{\beta}^x, \boldsymbol{\beta}^w, \dots]$ as predictScore. The TGc and TGd parameters are the affine-transformation parameters ( intercept, slope ) relating the predictive variance to the error variance ( Thorarinsdottir and Gneiting, 2010 ).

year and random effects

Individual ( tree differences ) can be random and year-to-year differences can be fixed; the latter are not ‘exchangeble’. In cases where there are no predictors that explain variation among individuals the formula can be limited to an intercept, using ~ 1. In this case, it can be valuable to estimate fecundity even when there are no good predictors. For this intercept-only model, random effects and/or year effects can allow for variation. Prior parameter values are supplied as discussed below.

#group plots in regions for year effects
region <- rep( 'sApps', nrow( treeData ) )
region[ as.character( treeData$plot ) == 'DUKE_EW' ] <- 'piedmont'

treeData$region <- region

formulaFec   <- as.formula( ~ diam )
formulaRep   <- as.formula( ~ diam ) 
yearEffect   <- list( groups = 'region' )
randomEffect <- list( randGroups = 'treeID', formulaRan = as.formula( ~ 1 ) )
inputs <- list( specNames = specNames, seedNames = seedNames, 
               treeData = treeData, seedData = seedData, 
               xytree = xytree, xytrap = xytrap, 
               priorDist = 28, priorVDist = 15, maxDist = 50, minDist = 15, 
               minDiam = 25, maxF = 1e+6, 
               randomEffect = randomEffect, yearEffect = yearEffect )
output <- mastif( inputs, formulaFec, formulaRep, ng = 2000, burnin = 1000 )
mastPlot( output )

Without predictors, the fitted variation is coming from random effects and year effects.

To substitute year effects for AR( $p$ ) effects, simply remove the argument specification of p in the yearEffect list:

yearEffect   <- list( groups = c( 'species', 'region' ) )   # year effects

prior parameter values

In previous examples, the prior parameter values were past in the list inputs. I can also specify prior values as inputs through the inputs$priorList or inputs$priorTable holding parameters named in Table 8.

Table 8. Components of inputs with default values.

object	explanation
`priorDist = 25`	prior mean dispersal distance parameter ( $m$ )
`priorVDist = 40`	prior variance dispersal parameter
`minDist = 4`	minimum mean dispersal distance ( $m$ )
`maxDist = 70`	maximum mean dispersal distance ( $m$ )
`maxF = 1e+8`	maximum fecundity ( seeds per tree-year )
`minDiam = 10`	minimum `diam` below which a tree can mature ( $cm$ )
`maxDiam = 40`	maximum `diam` above which a tree can be immature ( $cm$ )
`sigmaMu = 1`	prior mean residual variance $\sigma^2$
`sigmaWt = sqrt( nrow( treeData ) )`	weight on prior mean ( no. of observations )

The mean and variance of the dispersal kernel, priorDist and priorVDist, refer to the dispersal parameter $u$, transformed to the mean distance for the dispersal kernel. The parameters minDist and maxDist place minimum and maximum bounds on $d$.
Fecundity $\phi$ has a prior maximum value of maxF.

minDiam and maxDiam bound diameter ranges where a tree of unknown maturation status can be mature or not. This prior range is overridden by values in the treeData$repr column that establish an observed maturation state for a tree-year ( 0 - immature, 1 - mature ).

In general, large-seeded species, especially those that can be dispersed by vertebrates, generate noisy seed-trap data, despite the fact that the bulk of the counts still occur close to the parent, and the mean dispersal distance is relatively low. Large-seeded species produce few fruits/seeds. Long-distance dispersal cannot be estimated from inventory plots, regardless of plot size, because the fit is dominated by locally-derived seed. Consider values for maximum fecundity as low as maxF = 10000, minimum dispersal minDist = 2, and maximum dispersal maxDist = 12. Recall that the latter values refer to the dispersal kernel parameter value, not the maximum distance a seed can travel, which is un-constrained.

Prior parameter values can be passed as a list, e.g.,

inputs$priorList <- list( minDiam = 15, maxDiam = 60 )

Alternatively, prior parameter values can be passed as a table, with parameters for column names for specNames for rownames. Here is a file holding parameter values and the function mastPriors to obtain a table for several species in the genus $Pinus$,

d <- "https://github.com/jimclarkatduke/mast/blob/master/priorParametersPinus.csv?raw=True"
download.file( d, destfile="priorParametersPinus.csv" )

specNames <- c( "pinuEchi", "pinuRigi", "pinuStro", "pinuTaed", "pinuVirg" )
seedNames <- c( "pinuEchi", "pinuRigi", "pinuStro", "pinuTaed", "pinuVirg", "pinuUNKN" )
priorTable <- mastPriors( "priorParametersPinus.csv", specNames, 
                         code = 'code4', genus = 'pinus' )

The argument code = 'code8' specifies the column in priorParametersPinus.csv holding the codes corresponding to specNames. In mastif the format code4 means that names consist of the first 4 letters from the genus and species, like this: pinuTaed. The genus is provided to allow for genus-level parameters in cases where priors for individual species have not been specified in priorParametersPinus.csv.

For coefficients in fecundity, betaPrior specifies predictor effects by sign. The use of prior distributions that are flat, but truncated, is two-fold ( Clark et al. 2013 ). First, where prior information exists, it often is limited to a range of values. For regressions coefficients ( e.g., $\boldsymbol{\beta}$ ) we typically have a prior belief about the sign of the effect ( positive or negative ), but not its magnitude or appropriate weight. Second, within bounds placed by the prior, the posterior has the shape of the likelihood, unaffected by a prior weight. Prior weight is hard to specify sensibly for this hierarchical, non-linear model.

Here is an example using betaPrior to specify a quadratic diameter effect. First, here is a data set for Pinus:

d <- "https://github.com/jimclarkatduke/mast/blob/master/pinusExample.rdata?raw=True"
repmis::source_data( d )

Here I specify formulas, prior bounds for diam ( quadratic ), and fit the model:

formulaRep <- as.formula( ~ diam )
formulaFec <- as.formula( ~ diam + I( diam^2 ) )   
betaPrior  <- list( pos = 'diam', neg = 'I( diam^2 )' )
                  
inputs <- list( treeData = treeData, seedData = seedData, xytree = xytree, 
                xytrap = xytrap, specNames = specNames, seedNames = seedNames, 
                betaPrior = betaPrior, priorTable = priorTable, 
                seedTraits = seedTraits )
output <- mastif( inputs = inputs, formulaFec, formulaRep, 
                 ng = 500, burnin = 200 )

# restart
output <- mastif( output, formulaFec, formulaRep, ng = 5000, burnin = 1000 )
mastPlot( output )

Here again, many of the seed ID maxtrix elements are well identified ( trace plots labeled “M matrix plot”, others are uncertain due to small numbers of the different species that could contribute to a seed type. These are presented as bar and whisker plots for elements of matrix $\mathbf{M}$ ( fraction of seed from a species that is counted as each seed type ) in the plot labeled species -> seed type. Bar plots show the fraction of unknown seeds that derive from each species, labeled species -> seed type. The distribution of diameters in the data will limit the ability to estimate its effect on maturation and fecundity.

Note that the estimates for fecundity coefficients, $\boldsymbol{\beta}^x$, are positive for diam and negative for diam^2. If there is no evidence in the data for a decrease $\frac{\partial{\psi}}{\partial{diam}}$, then the quadratic term is near zero. In this case, the predicted fecundity in the plot labelled maturation, fecundity by diameter will increase exponentially.

maturation/fecundity data, if available

Most individuals produce no seed, due to resource limitation, especially light, in crowded stands. As trees increase in size and resource access they mature, and become capable of seed production. Maturation is hard to observe, so it must be modelled. In our data sets, maturation status is observed, but most values are NA; it is only assigned if certain. A value of 1 means that reproduction is observed. A value of 0 means that the entire crown is visible during the fruiting season, and it is clear that no reproduction is present. Needless to say, most observations from beneath closed canopies are NA. The observations are entered in the column repr in the data.frame treeData. Above I showed an example where minDiam is set as a prior that trees of smaller diameters are immature and maxDiam is the prior that trees above this diameter are mature.

mastif further admits estimates of cone production. The column treeData$cropCount refers to the cones that will open and contribute to trap counts in the year treeData$year. There is another column treeData$cropFraction, which refers to the fraction of the tree canopy that is represented by the cone count. When crop counts are used, a matrix seedTraits must be supplied with one row per species and a column with seeds per fruit, cone, or pod. The rownames of seedTraits are the specNames. The colname of seedTraits with seeds per fruit is labeled seedsPerFruit.

when trees are sampled less frequently than seeds

Seed studies often include seed collections in years when trees are not censused. For example, tree data might be collected every 2 to 5 years, whereas seed data are available as annual counts. In this case, there there are years in seedData$year that are missing from treeData$year. mastif needs a tree year for each trap year. If tree data are missing from some trap years, I need to constuct a complete treeData data.frame that includes these missing years. This amended version of treeData must be accessible to the user, to allow for the addition of covariates such as weather variables, soil type, and so forth.

The function mastFillCensus allows the user to access the filled-in version of treeData that will be fitted by mastif. mastFillCensus accepts the same list of inputs that is passed to the function mastif. The missing years are inserted for each tree with interpolated diameters. The list inputs is returned with objects updated to include the missing census years and modified slightly for analysis by mastif. inputs$treeData can now be annotated with the covariates that will be included in the model. Here is the example from the help file. First I read a file that has complete years, so I randomly remove years:

# randomly remove years for this example:
years <- sort( unique( treeData$year ) )
sy    <- sample( years, 5 )
treeData <- treeData[treeData$year %in% sy, ]
treeData[1:10, ]

Note the missing years, as is typical of mast data sets. Here is the file after filling missing years:

inputs   <- list( specNames = specNames, seedNames = seedNames, 
                  treeData = treeData, seedData = seedData, 
                  xytree = xytree, xytrap = xytrap, priorTable = priorTable, 
                  seedTraits = seedTraits )
inputs <- mastFillCensus( inputs, beforeFirst=10, afterLast=10 )
inputs$treeData[1:10, ]

The missing plotYr combinations in treeData have been filled to match thos in seedData. Now columns can be added to inputs$treeData, as needed in formulaFec or formulaRep.

In cases where tree censuses start after seed trapping begins or tree censuses end before seed trapping ends, it may be reasonable to assume that the same trees are producing seed in those years pre-trapping or post-trapping years. In these cases, mastFillCensus can accommodate these early and/or late seed trap data by extrapolating treeData beforeFirst years before seed trapping begins or afterLast years after seed tapping ends.

adding covariates

Continuing with the $Pinus$ example, here I want to add covariates for missing years as needed for an AR( $p$ ) model, in this example $p$ = 4. First, consider the tree-years included in this sample, before and after in-filling:

# original data
table( treeData$year )

# filled census
table( inputs$treeData$year )
table( seedData$year )

Note that the infilled version of tree data in the list inputs has the missing years, corresponding to those included in seedData. Here I set up the year effects model and infill to allow for p = 3,

p <- 3
inputs   <- mastFillCensus( inputs, p = p )
treeData <- inputs$treeData

Here are regions for random effects in the AR( 3 ) model:

region <- c( 'CWT', 'DUKE', 'HARV' )
treeData$region <- 'SCBI'
for( j in 1:length( region ) ){
  wj <- which( startsWith( treeData$plot, region[j] ) )
  treeData$region[wj] <- region[j]
}
inputs$treeData <- treeData
yearEffect <- list( groups = c( 'species', 'region' ), p = p )

I add the climatic deficit ( monthly precipitation minus PET ) as a covariant. First, here is a data file,

d <- "https://github.com/jimclarkatduke/mast/blob/master/def.csv?raw=True"
download.file( d, destfile="def.csv" )

The format for the file dev.csv is plot by year_month. I have saved it in my local directory. The function mastClimate returns a list holding three vectors, each having length equal to nrow( treeData ). Here is an example for the cumulative moisture deficit for the previous summer. I provide the file name, the vector of plot names, the vector of previous years, and the months of the year. To get the cumulative deficit, I use FUN = 'sum':

treeData <- inputs$treeData
deficit  <- mastClimate( file = 'def.csv', plots = treeData$plot, 
                         years = treeData$year - 1, months = 6:8, 
                         FUN = 'sum', vname='def' )
treeData <- cbind( treeData, deficit$x )
summary( deficit )

The first column in deficit is the variable itself for each tree-year. The second column holds the site mean value for the variable. The third column is the difference between the first two columns All three can be useful covariates, each capturing different effects ( Clark et al. 2014 ). I could append any or all of them as columns to treeData.

Here is an example using minimum temperature of the preceeding winter. I obtain the minimum with two calls to mastClimate, first for Dec of the previous year ( years = treeData$year - 1, months = 12 ), then from Jan, Feb, Mar for the current year ( years = treeData$year, months = 1:3 ). I then take the minimum of the two values:

# include min winter temperature
d <- "https://github.com/jimclarkatduke/mast/blob/master/tmin.csv?raw=True"
download.file( d, destfile="tmin.csv" )

# minimum winter temperature December through March of previous winter

t1 <- mastClimate( file = 'tmin.csv', plots = treeData$plot, 
                   years = treeData$year - 1, months = 12, FUN = 'min', 
                   vname = 'tmin' )
t2 <- mastClimate( file = 'tmin.csv', plots = treeData$plot, 
                   years = treeData$year, months = 1:3, FUN = 'min', 
                   vname = 'tmin' )
tmin <- apply( cbind( t1$x[, 1], t2$x[, 1] ), 1, min )
treeData$tminDecJanFebMar <- tmin
inputs$treeData <- treeData

Here is a model using several of these variables:

formulaRep <- as.formula( ~ diam )
formulaFec <- as.formula( ~ diam + defJunJulAugAnom + tminDecJanFebMar )   
inputs$yearEffect <- yearEffect
output <- mastif( inputs = inputs, formulaFec, formulaRep, ng = 2500, burnin = 1000 )

Data for future years can be a scenario, based on assumptions of status quo in the mean and variance, an assumed rate of climate change, and so on.

flexibility

Seed production is volatile, with order of magnitude variation from year-to-year. There is synchronicity among individuals of the same species, the “masting” phenomenon. There are large difference between individuals, that are not explained by environmental variables. In this section I discuss extensions to random effects, year effects, lag effects, and fitting multiple species having seed types that are not always identifiable to species.

random individual effects

Random individual effects currently can include random intercepts for the fecundities of each individual tree that is imputed to be in the mature state for at least 3 years. [There are no random effects on maturation, because they would be hard to identify from seed trap data for this binary response.]

I include in the inputs list the list randomEffect, which includes the column name for the random group. Typically this would be a unique identifier for a tree within a plot, e.g., randomEffect$randGroups = 'tree'. However, randGroups could be the plot name. This column is interpreted as a factor, each level being a group. Random effects will not be fitted on individuals that are in the mature state less than 3 years.

The formulaRan is the random effects model. Because individual time series tend to short, it is currently implemented only for intercepts. Here is a fit with random effects.

formulaFec <- as.formula( ~ diam )    # fecundity model
formulaRep <- as.formula( ~ diam )    # maturation model 
inputs$randomEffect <- list( randGroups = 'tree', formulaRan = as.formula( ~ 1 ) )
output <- mastif( inputs = inputs, formulaFec, formulaRep, ng = 2000, burnin = 1000 )

Here is a restart:

output <- mastif( inputs = output, ng = 4000, burnin = 1000 )
mastPlot( output )

There is a new panel for fixed plus random effects, showing the individual combinations of intercepts.

the prediction panel ( b ) shows some improvement, indicating that even with random effects, diameter struggles to predict maturation/fecundity.

output$fit

random groups for years and lags

As discussed previously, the model admits year effects and lag effects, the latter as an AR( $p$ ) model. Year effects assign a coefficient to each year $t = [1, \dots, T_j]$. Lag effects assign a coefficient to each of $j \in \{1, \dots, p\}$ plags, where the maximum lag $p$ should be substantially less than the number of years in the study. Year effects can be organized in random groups. Specification of random groups is done in the same way for year effects and for lag effects.

I define random groups for year and lag effects by species, by plots, or both. When there are multiple species that contribute to the modeled seed types, I expect the year effects to depend on which species is actually producing the seed. When there are multiple plots sufficiently distant from one another, I might allow for the fact that year effects or lag effects differ by group; yearEffect$groups allows that they need not mast in the same years. In the example below, I’ll use the term region for plots in the same plotGroup.

Here is a breakdown for this data set by region:

with( treeData, colSums( table( plot, region ) ) )

Here are year effects structured by random groups of plots, given by the column region in treeData:

yearEffect <- list( groups = c('species', 'region' ) )

This option will fit a year effect for both provinces and years having sufficient individuals estimated to be in the mature state. Here is the model with random year effects,

\[\log \psi_{ij, t} \sim N \left( \mathbf{x}'_{ij, t} \mathbf{\beta}^x + \gamma_t + \gamma_{g[i], t}, \sigma^2 \right )\] where the year effect $\gamma_{g[i], t}$ is shared by trees in all plots defined by yearEffect$province, $ij \in g$. Year effects are sampled directly from conditional posteriors.

inputs$treeData      <- treeData 
inputs$randomEffect  <- randomEffect
inputs$yearEffect    <- yearEffect
output <- mastif( inputs = inputs, formulaFec, formulaRep, ng = 2500, burnin = 500 )

Here is a restart, with predictions for one of the plots:

predList <- list( mapMeters = 10, plots = 'DUKE_BW', years = 1998:2014 ) 
output$predList <- predList
output <- mastif( inputs = output, ng = 3000, burnin = 1000 )
mastPlot( output )

Note that still more iterations are needed for convergence. Here are some comments on mastPlot:

In the dispersal parameter u panel there are now year effects plotted for the two random groups mtn and piedmont.
The subsequent panel dispersal mean and variance shows the mean and variance of random effects
There is a dispersal by group panel showing posterior estimate for the two random groups, with scales for the parameter $u$ on the left ( m$^2$ ) and mean parameter $d$ on the right ( m ).
There has been some improvement in the prediction, panel ( b ).
The predicted fecundity, seed data maps for the plot DUKE_BW show seed prediction surfaces.
The year effect groups shows year effects for random groups in treeData$region.
partial ACF shows partial autocorrelation by species and plot.

random effects when there are multiple species

Most data sets have multiple seed types that complicate estimation of mast production by each species. This example considers Pinus spp, seeds of which cannot be confidently assigned to species. Here I load the data and generate a sample of maps from several years, including all species and seed types.

d <- "https://github.com/jimclarkatduke/mast/blob/master/pinusExample.rdata?raw=True"
repmis::source_data( d )

mapList <- list( treeData = treeData, seedData = seedData, 
                 specNames = specNames, seedNames = seedNames, 
                 xytree = xytree, xytrap = xytrap, mapPlot = 'DUKE_EW', 
                 mapYears = c( 2007:2010 ), treeSymbol = treeData$diam, 
                 treeScale = .6, trapScale=1.4, 
                 plotScale = 1.2, LEGEND=T )
mastMap( mapList )

Note the tendency for high seed accumulation ( large green squares ) near dense, large trees ( large brown circles ).

In this example, I again model seed production as a function of log diameter, diam, now for multiple species and seed types. This is an AR( p ) model, because I include the number of lag terms yearEffect$p = 5,

\[\log \psi_{ij, t} \sim N \left( \mathbf{x}'_{ij, t} \mathbf{\beta}^x + \sum^p_{l=1} ( \alpha_l + \alpha_{g[i], l} ) \psi_{ij, t-l}, \sigma^2 \right )\] Only years $p < t \le T_i$ are used for fitting. Samples are drawn directly from the conditional posterior distribution.

In the table printed at the outset are trees by plot and year, i.e., the groups assigned in inputs$yearEffect. The zeros indicate either absence of trees or that no plots were sampled in those years. ( These are not the same thing, mastif knows the difference ).

In the code below I specify formulas, AR( $p$ ), and random effects, and some prior values. Due to the large number of trees, convergence is slow. Because I do not assume that trees of different species necessarily mast in the same years, I allow them to differ through random groups on the AR( $p$ ) terms.

formulaFec <- as.formula( ~ diam )   # fecundity model
formulaRep <- as.formula( ~ diam )   # maturation model

yearEffect   <- list( groups = 'species', p = 4 )   # AR( 4 )
randomEffect <- list( randGroups = 'tree', 
                     formulaRan = as.formula( ~ 1 ) )

inputs   <- list( specNames = specNames, seedNames = seedNames, 
                  treeData = treeData, seedData = seedData, 
                  yearEffect = yearEffect, 
                  randomEffect = randomEffect,
                  xytree = xytree, xytrap = xytrap, priorDist = 20, 
                  priorVDist = 5, minDist = 15, maxDist = 30, 
                  minDiam = 12, maxDiam = 40, 
                  maxF = 1e+6, seedTraits = seedTraits )
output <- mastif( inputs = inputs, formulaFec, formulaRep, ng = 500, burnin = 100 )

Here is a restart:

output <- mastif( inputs = output, ng = 3000, burnin = 1000 )
mastPlot( output, plotPars = plotPars )

Again, convergence will require more iterations. The AR( $p$ ) coefficients in the lag effect group panel shows the coefficients by random group. They are also shown in a separate panel, with each group plotted separately. In the ACF eigenvalues panel are shown the eigenvalues for AR lag coefficients on the real ( horizontal ) and imaginary ( vertical ) scales with the unit circle, within which oscillations are damped. The imaginary axis describes oscillations.

Here’s a restart with predictions:

plots <- c( 'DUKE_EW', 'CWT_118' )
years <- 1980:2025
output$predList <- list( mapMeters = 10, plots = plots, years = years ) 
output <- mastif( inputs = output, ng = 3000, burnin = 1000 )

and updated plots:

mastPlot( output, )

Note that convergence requires additional iterations ( larger ng ). The predictions of seed production will progressively improve with convergence.

mapList <- output
mapList$mapPlot <- 'DUKE_EW'
mapList$mapYears <- c( 2011:2012 )
mapList$PREDICT <- T
mapList$treeScale <- .5
mapList$trapScale <- .8
mapList$LEGEND <- T
mapList$scaleValue <- 50
mapList$plotScale  <- 2
mapList$COLORSCALE <- T
mapList$mfrow <- c( 2, 1 )

mastMap( mapList )

Or a larger view of a single map:

mapList$mapPlot <- 'CWT_118'
mapList$mapYears <- 2015
mapList$PREDICT <- T
mapList$treeScale <- 1.5
mapList$trapScale <- .8
mapList$LEGEND <- T
mapList$scaleValue <- 50
mapList$plotScale <- 2
mapList$COLORSCALE <- T
mapList$mfrow <- c( 1, 1 )
mastMap( mapList )

Here is a summary of parameter estimates:

summary( output )

convergence

R code is highly vectorized. Unavoidable loops are written in C++ and exploit the C++ library Armadillo, available through RcppArmadillo. Alternating with Metropolis are Hamiltonian MC steps to encourage large movements.

Despite extensive vectorization and C++ for cases where loops are unavoidable, convergence can be slow. A collection of plots inventoried over dozens of years can generate in excess of $10^6$ tree-year observations and $10^4$ trap year observations. There is no escaping the requirement of large numbers of indirectly sampled latent variables. If random effects are included, there are ( obviously ) as many random groups as there are trees. All tree fecundities must be imputed.

time series: volatility and periodicity

Qui et al. (2023) introduced volatility and period to characterize and compare masting behavior between trees and species. The tradition coefficient of variation (CV) ignores the time-dependence in quasi-periodic seed production. To avoid confusion with indices based on variance, they introduced the term volatility as the period-weighted spectral density, to allow for the fact that long intervals are especially important for masting causes and effects on consumers. In addition to this period-weighting of spectral variance within a series (a tree), fecundity-weighting is important at the population scale, because the highly productive individuals dominate masting effects. Within this framework, periodicity extracts the period (in years) that is likewise weighted to emphasize variance concentrated at long intervals within trees and fecundity differences between trees.

The block of code that follows loads output from some trees in the genus Abies from western North America. Specifically, the estimates for fecundity by tree-year come from the object output$prediction$fecPred. it uses the function mastVolatility to compile volatility and periodicity on each tree and display summaries by ecoregion-species as density plots and as distributions of mean period estimates.

First, several more functions:

getColor <- function( col, trans ){                  # transparent colors
  tmp <- col2rgb( col )
  rgb( tmp[ 1, ], tmp[ 2, ], tmp[ 3, ], maxColorValue = 255, 
       alpha = 255*trans, names = paste( col, trans, sep = '_' ) )
}

getPlotLayout <- function( np, WIDE = TRUE ){
  
  # np - no. plots
  
  if( np == 1 )return( list( mfrow = c( 1, 1 ), left = 1, bottom = c( 1, 2 ) ) )
  if( np == 2 ){
    if( WIDE )return( list( mfrow = c( 1, 2 ), left = 1, bottom = c( 1, 2 ) ) )
    return( list( mfrow = c( 2, 1 ), left = c( 1, 2 ), bottom = 2 ) )
  }
  
  if( np == 3 ){
    if( WIDE )return( list( mfrow = c( 1, 3 ), left = 1, bottom = c( 1:3 ) ) )
    return( list( mfrow = c( 3, 1 ), left = 1:3, bottom = 3 ) )
  }
  if( np <= 4 )return( list( mfrow = c( 2, 2 ), left = c( 1, 3 ), bottom = c( 3:4 ) ) )
  if( np <= 6 ){
    if( WIDE )return( list( mfrow = c( 2, 3 ), left = c( 1, 4 ), bottom = c( 4:6 ) ) )
    return( list( mfrow = c( 3, 2 ), left = c( 1, 3, 5 ), bottom = 5:6 ) )
  }
  if( np <= 9 )return( list( mfrow = c( 3, 3 ), left = c( 1, 4, 7 ), bottom = c( 7:9 ) ) )
  if( np <= 12 ){
    if( WIDE )return( list( mfrow = c( 3, 4 ), left = c( 1, 5, 9 ), bottom = c( 9:12 ) ) )
    return( list( mfrow = c( 4, 3 ), left = c( 1, 4, 7, 10 ), bottom = 10:12 ) )
  }
  if( np <= 16 )return( list( mfrow = c( 4, 4 ), left = c( 1, 5, 9, 13 ), 
                            bottom = c( 13:16 ) ) )
  if( np <= 20 ){
    if( WIDE )return( list( mfrow = c( 4, 5 ), left = c( 1, 6, 11, 15 ), 
                            bottom = c( 15:20 ) ) )
    return( list( mfrow = c( 5, 4 ), left = c( 1, 5, 9, 13 ), bottom = 17:20 ) )
  }
  if( np <= 25 )return( list( mfrow = c( 5, 5 ), left = c( 1, 6, 11, 15, 20 ), 
                            bottom = c( 20:25 ) ) )
  if( np <= 25 ){
    if( WIDE )return( list( mfrow = c( 5, 6 ), left = c( 1, 6, 11, 15, 20, 25 ), 
                            bottom = c( 25:30 ) ) )
    return( list( mfrow = c( 6, 5 ), left = c( 1, 6, 11, 16, 21, 26 ), bottom = 26:30 ) )
  }
  if( np <= 36 ){
    return( list( mfrow = c( 6, 6 ), left = c( 1, 7, 13, 19, 25, 31 ), bottom = c( 31:36 ) ) )
  }
  return( list( mfrow = c( 7, 6 ), left = c( 1, 7, 13, 19, 25, 31, 37 ), bottom = c( 37:42 ) ) )
}

  plotFec <- function( fec, groups = NULL, LOG = F ){
    
    if( is.null(groups) ){
      groupID <- rep(1, nrow(fec) )
      ngroup  <- 1
    }else{
      group   <- fec[, groups]
      groups  <- sort( unique( group ) )
      groupID <- match( group, groups )
      ngroup  <- length( groups )
    }
    
    if( LOG )fec$fecEstMu <- log10(fec$fecEstMu)
    
    xlim  <- range( fec$year )
    ylim  <- range( fec$fecEstMu, na.rm = T )
    mfrow <- getPlotLayout(ngroup)
    
    par( mfrow = mfrow$mfrow, bty = 'n', mar = c(4,4,1,1), omi = c( .5, .5, .2, .2) )
    
    for( k in 1:ngroup ){
      
      fk    <- fec[ groupID == k, ]
      tree  <- fk$treeID
      trees <- sort( unique( tree ) )
      ntree <- length( trees )
      
      plot( NA, xlim = xlim, ylim = ylim, xlab = '', ylab = '' )
      
      for(j in 1:ntree){
        fj <- fk[ tree == trees[j], ]
        lines( fj$year, fj$fecEstMu, lwd = 1 )
      }
      title( groups[k] )
    }
    mtext( 'Year', 1, outer = T, cex = 1.2 )
    ytext <- 'Seeds per tree'
    if( LOG ) ytext <- 'Seeds per tree (log_10)'
    mtext( ytext, 2, outer = T, cex = 1.2 )
  }

Here is the volatility:

d <- "https://github.com/jimclarkatduke/mast/blob/master/outputAbies.rdata?raw=True"
repmis::source_data( d )

specs   <- sort( unique( fecPred$species ) )     # accumulate period estimates
yseq    <- seq( 0, 10, length = 100 )       
intVal  <- matrix( 0, length(specs), 100 )
weight  <- intVal
rownames( intVal ) <- specs
plotSpecs <- sort( unique( fecPred$plotSpec ) )  # label trees in a plot-species group

# time series for one species:
plotFec( fec = fecPred[ fecPred$species == 'abiesGrandis', ], groups = 'species', LOG = T )
  
par( mfrow = c(1, 2), bty = 'n', mar = c(3,4,1,1), omi = c(.5,.1,.1,.1) )

plot( NA, xlim = c(1, 20), ylim = c(.01, 1), xlab = '', ylab = 'Density/nyr', log = 'xy')

for( i in 1:length(plotSpecs) ){
  
  wi  <- which( fecPred$plotSpec == plotSpecs[i] ) # tree-years in group
  ci  <- fecPred$species[wi[1]]
  col <- match( ci, specs )                        # color by species
  tmp <- mastVolatility( treeID = fecPred$treeID[wi], year = fecPred$year[wi], 
                         fec = fecPred$fecEstMu[wi] )
  if( is.null(tmp) )next
  
  intVal[ col, ] <- intVal[ col, ] + dnorm( yseq, tmp$stats['Period', 1], tmp$stats['Period', 2] )
  weight[ col, ] <- weight[ col, ] + length( wi )
  
  # density +/- 1 SD
  dens <- tmp$statsDensity
  lines( dens[ 'Period', ], dens[ 'Mean', ], lwd = 2, col = getColor( col, .4) )
  lines( dens[ 'Period', ], dens[ 'Mean', ] - dens[ 'SD', ], lty = 2, col = getColor( col, .4) )
  lines( dens[ 'Period', ], dens[ 'Mean', ] + dens[ 'SD', ], lty = 2, col = getColor( col, .4) )
}
title( 'a) Plot-species groups' )
legend( 'topright', specs, text.col = c(1:length(specs)), bty = 'n', cex = .8 )

intVal <- intVal*weight/weight

plot( NA, xlim = c(2, 8), ylim = c(0, .12), xlab = '', ylab = 'Density' )  # density of mean intervals
for( i in 1:length(specs) ){
  polygon( yseq, intVal[i,]/sum(intVal[i,]), border = i, col = getColor(i, .4) )
}
title( 'b) Period estimates' )
mtext( 'Year', 1, outer = T, cex = 1.3 )

The following block of code compiles the same densities and summaries for year effects, which remain after accounting for other predictors in the model. When fitted with year effects, with yearEffect <- list( groups = c( 'species', 'ecoCode' ) ), the column treeData$ecoCode holds the ecoregion where the tree lives. The ecoRegion_species combinations represent random groups of year effects that are shared by all trees of the same species within the same ecoregion. These combinations are rownames in output$parameters$betaYrRand. Here I use the function mastSpectralDensity to evaluate individual rows in betaYrRand.

spec  <- strsplit( rownames( betaYrRand ), '_' ) # extract species from ecoregion_species rownames
spec  <- sapply( spec, function(x) x[2] )
specs <- sort( unique(spec) )

mastMatrix <- matrix( 0, nrow(betaYrRand), 5 )   # store stats by group
rownames(mastMatrix) <- rownames(betaYrRand)
colnames(mastMatrix) <- c( 'nyr', 'Variance', 'Volatility', 'Period Est', 'Period SD' )

plot( NA, xlim = c(2, 10), ylim = c(.002, .4), xlab = '', ylab = 'Density/yr', log = 'xy')

for(i in 1:nrow(betaYrRand)){
  
  wc <- which( betaYrRand[i,] != 0 )
  if( length(wc) < 6 )next
  
  s <- mastSpectralDensity( betaYrRand[i,wc] )
  if( !is.matrix( s$spect ) )next
  
  mastMatrix[i, ] <- c( length(wc), s$totVar, s$volatility, s$periodMu, s$periodSd )
  
  period <- 1/s$spec[, 'frequency' ]
  dens   <- s$spec[, 'spectralDensity' ]/length(wc)  # series vary in length
  col    <- match( spec[i], specs )
  
  lines( period, dens, lwd = 2, col = getColor( col, .4) )
}
title( 'c) Year effects' )
mtext( 'Period (yr)', 1, line = 1, outer = T )

keepRows   <- which(  is.finite(mastMatrix[,'Variance']) & mastMatrix[,'Variance'] != 0 )
#keepCols   <- which( colSums( frequency, na.rm=T ) > 0 )
mastMatrix <- mastMatrix[ keepRows, ]

trouble shooting

Because seed-trap studies involve multiple data sets ( seed traps, trees, covariates ) that are collected over a number of years and multiple sites, combining them can expose inconsistencies that are not immediately evident. Of course, a proper analysis depends on alignment of trees, seed traps, and covariates with unique tree names ( treeData$tree ) and trap names ( seedData$trap ) in each plot ( treeData$plot, seedData$plot ).

Notes are displayed by mastif at execution summarizing aspects of the data that might trigger warnings. All of these issues have arisen in data sets I have encountered from colleagues:

Alignment of data frames. The unique trees in each plot supplied in treeData must also appear with x and y in xytree. The unique traps in each plot supplied in seedData must also appear with x and y in xytrap. Problems generate a note:

Note: treeData includes trees not present in xytree

Spatial coordinates. Because tree censuses and seed traps are often done at different times, by different people, the grids often disagree. Spatial range tables are displayed for ( x, y ) coordinates in xytree and xytrap.

Unidentified seeds. Seeds that cannot be identified to species contain the character string UNKN. If there are species in specNames that do not appear in seedNames, then the UNKN seed type must be included in seedNames and in columns of seedData. For example, if caryGlab, caryTome appear in specNames, and caryGlab, caryUNKN appear in seedNames ( and as columns in seedData ), then caryUNKN will be the imputed fate for all seeds emanating from caryOvat and some seeds from caryGlab.

This note will be displayed:

Note: unknown seed type is caryUNKN

If there are seedNames that do not appear in specNames, this note is given:

Note: seedNames not in specNames and not "UNKN": caryCord

Moved caryCord to "UNKN" class

Design issues. The design can seem confusing, because there are multiple species on multiple plots in multiple years. There is a design matrix that can be found here for fecundity:

output$inputs$setupData$xfec

and here for maturation:

output$inputs$setupData$xrep

( Variables are standardized, because fitting is done that way, but coefficients are reported on their original scales. Unstandardized versions of design matrices are xfecU and xrepU. )

There should not be missing values in the columns of treeData that will be used as predictors ( covariates or factors ). If there are missing values, a note will be generated:

Fix missing values in these variables: [1] "yearlyPETO.tmin1, yearlyPETO.tmin2, flowering.covs.pr.data, flowering.covs.tmin.data, s.PETO"

There can be missing seed counts in seedData–missing values will be imputed.

A table containing the Variance Inflation Factor ( VIF ), range of each variable, and correlation matrix will be generated at execution. VIF values > 10 and high correlations between covariates are taken as evidence of redundancy. A table will be generated for each species separately. However, in xfec and xrep they are treated as a single matrix.

Year effects by random group require replication within groups. Here is a note for the AR model showing sizes of groups defined by species and region ( NE, piedmont, sApps ):

no. trees with > plag years, by group:

  `caryGlab-NE caryGlab-piedmont    caryGlab-sApps `
  `          1               497               361 `
  
  `caryOvat-piedmont caryTome-piedmont    caryTome-sApps `
  `        122               569                77 `

small group: caryGlab-NE

There is only one tree in the caryGlab-NE group, suggesting insufficient replication and a different aggregation scheme.

For the AR( $p$ ) model, values are imputed for $p$ years before and after a tree is observed, and only trees observed for > $p$ years will contribute to parameter estimates. If the study lasts 3 years, then the model should not specify yearEffect$p = 5. A note will be generated to inform on the number of observations included in parameter estimates:

Number of full observations with AR model is: [1] 21235

Prediction. If predList is supplied, then fecundity and seed density will be predicted for specified plot-years. The size of the prediction grid is displayed as a table of prediction nodes by plot and year. Large prediction grids slow execution. To reduce the size of the grid, increase the inputs$predList$mapMeters ( the default is 5 m by 5 m ).

When covariates are added as columns to treeData, they must align with treeData$plot, treeData$year, and, if they are tree-level covariates, with treeData$tree. The required column treeData$diam is an example of the latter.

Mast Inference and Forecasting ( mastif )

James S. Clark

2024-03-28

citation:

summary

`mastif` inputs

simulated data

`mastSim`

`treeData` and `xytree`

data summary and maps

model fitting

`output` summary, lists, and plots

estimates, and predictions in `output`

plots of `output`

slow convergence?

multiple seed types per species

`specNames` and `seedNames`

`mastPlot` written to files

my data

diameter effect

year and random effects

prior parameter values

maturation/fecundity data, if available

when trees are sampled less frequently than seeds

adding covariates

flexibility

random individual effects

random groups for years and lags

random effects when there are multiple species

convergence

time series: volatility and periodicity

trouble shooting

references

`mastSim` object	variable	explanation
`trueValues$fec`	\(\psi_{ij, t}\)	conditional fecundity
`trueValues$repr`	\(\rho_{ij, t}\)	true maturation status
`treeData$repr`	\(z_{ij, t}\)	observed maturation status ( with `NA` )
`trueValues$betaFec`	\(\beta^x\)	coefficients for fecundity
`trueValues$betaRep`	\(\beta^v\)	coefficients for maturation
`trueValues$R`	\(\mathbf{m}\)	`specNames` to `seedNames matrix`, rows = \(\mathbf{m}_h\)
`seedData$active`	in \(A_{sj, t}\)	fraction of time trap is active
`seedData$area`	in \(A_{sj, t}\)	trap area

Mast Inference and Forecasting ( mastif )

James S. Clark

2024-03-28

citation:

summary

mastif inputs

simulated data

mastSim

treeData and xytree

data summary and maps

model fitting

output summary, lists, and plots

estimates, and predictions in output

plots of output

slow convergence?

multiple seed types per species

specNames and seedNames

mastPlot written to files

my data

diameter effect

year and random effects

prior parameter values

maturation/fecundity data, if available

when trees are sampled less frequently than seeds

adding covariates

flexibility

random individual effects

random groups for years and lags

random effects when there are multiple species

convergence

time series: volatility and periodicity

trouble shooting

references

`mastif` inputs

`mastSim`

`treeData` and `xytree`

`output` summary, lists, and plots

estimates, and predictions in `output`

plots of `output`

`specNames` and `seedNames`

`mastPlot` written to files