---
title: "03 Working with a single population and transition"
output:
  rmarkdown::html_vignette:
    fig_caption: yes
vignette: >
  %\VignetteIndexEntry{03 Working with a single population and transition}
  %\VignetteEngine{knitr::rmarkdown}
  \usepackage[utf8]{inputenc}
---

```{r setup, message = FALSE}
knitr::opts_chunk$set(echo = TRUE, tidy = TRUE)
library(dplyr)
library(purrr)
library(tibble)
library(ggplot2)
library(popbio) # for projection.matrix()
library(raretrans)
# Raymond's theme modifications
rlt_theme <- theme(axis.title.y = element_text(colour="grey20",size=15,face="bold"),
        axis.text.x = element_text(colour="grey20",size=15, face="bold"),
        axis.text.y = element_text(colour="grey20",size=15,face="bold"),  
        axis.title.x = element_text(colour="grey20",size=15,face="bold"))
```

The goal of this vignette is to demonstrate the use of package `raretrans` on a single population and transition period.

# Part I: Obtaining the projection matrix 

`raretrans` assumes the projection matrix is a list of two matrices, a transition matrix and a fertility matrix. This is the output format of `popbio::projection.matrix`. If we have individual transitions in a dataframe
we can use `popbio::projection.matrix` to obtain the required data. We demonstrate with the transition and fertility data for the epiphytic orchid *Lepanthes elto* **POPNUM 250** in **period 5**. 

## Step 1: Load and munge the single population data for *L. elto*

```{r viewData}
data("L_elto") # load the dataset `L_elto` into memory (from package `raretrans`)
head(L_elto) 
```

Each row of this dataframe has columns for current stage, the next stage, and average per individual fertility. Note that “p" stands from "plantula" which means "seedlings" in Spanish. The first set of lines below change the name of the life history stage  from “p” to “s” after selecting the population and time period. 

```{r mungeData}
onepop <- L_elto %>%   
# Filter out population # 250, period (year) 5
  filter(POPNUM == 250, year == 5) %>% 
  # redefine p for el plantón to s for seedling
  mutate(stage = case_when(stage == "p" ~ "s",
                           TRUE ~ stage),
         next_stage = case_when(next_stage == "p"~ "s",
                                TRUE ~ next_stage))


# popbio::projection.matrix doesn't like tibbles
# set TF = TRUE to get the right format.
TF <- popbio::projection.matrix(as.data.frame(onepop), 
                        stage = stage, fate = next_stage, 
                        fertility="fertility", sort=c("s","j","a"), TF = TRUE)
TF # This is the stage-structured population model

```

Our stages are now coded as **s** (seedling), **j** (juvenile), and **a** (adult), and we now have two matrices: **T** (stage transition) and **F** (fecundity).
The observed asymptotic population growth rate is $\lambda =$ `r format(lambda(TF$T + TF$F),digits = 2)`.
The rare transitions missing from our first transition matrix, `TF$T`, are the transition from seedling (*s*) to adult (*a*), and the transition from *j* to *s*. But we know that these happen. 

## Step 2: Obtain starting number of individuals per stage

Since our priors are based on counts (number of individuals, *N*) and the prior equivalent sampling size is expressed as a multiple of the number of individuals observed, we need to obtain the  number of individuals in each stage ($N$) at the first time period.

We use function `raretrans::get_state_vector()` to obtain the starting individual count, *N*.

```{r}
N <- get_state_vector(onepop, stage = stage, sort=c("s","j","a")) 
N # A vector of # of starting individuals for each stage
```

The list of matrices and vector of individual counts do not have to come from a dataframe as we've done here. 
As long as they have the expected format they can be created by hand. We use population 231 in period 2 as an example, splitting the matrix into transition T and fecundity F matrices. Below, "m" stands for "muerte" that is plants that are dead.

```{r, echo=FALSE}
Tmat <- L_elto %>% 
  mutate(stage = factor(stage, levels=c("p","j","a","m")),
         fate = factor(next_stage, levels=c("p","j","a","m"))) %>%
  filter(POPNUM == 231, year == 2) %>%
  as.data.frame() %>% 
  xtabs(~fate + stage, data = .) %>% 
  as.matrix()
Tmat <- Tmat[,-4] #throw away the column for transitions to death
N2 <- colSums(Tmat) #get the total number ... CHECK should be before 86
Tmat <- sweep(Tmat[-4,], 1, N2, "/") #normalize to 1, discarding transitions FROM death
# figure out how much reproduction happened in year 2 by looking at year 3
# L_elto %>% 
#   mutate(stage = factor(stage, levels=c("p","j","a","m")),
#          fate = factor(next_stage, levels=c("p","j","a","m"))) %>%
#   filter(POPNUM == 231, year == 3, stage == "p") %>%
#   count(stage) 
# 2 offspring from N[3] == 16 adults
Fmat <- matrix(0, nrow=3, ncol=3) # create a matrix full of zeros
Fmat[1,3] <- 2/16 # counted 16 adults in this stage, and 2 seedlings in next year
TF2 <- list(T = Tmat, F = Fmat)
```

```{r}
TF2
N2
```

This matrix is missing the transition from seedling to juvenile, and none of the 6 juveniles died leading to an overestimate of survival. The observed asymptotic population growth rate is $\lambda =$ 
`r format(lambda(TF2$T + TF2$F),digits = 2)`. The matrix is not Ergodic (you can't get to every other state from one or more states), and
reducible, meaning one or more columns and rows can be discarded and have the same eigen properties. 

# Part 2: Using priors to incorporate rare transitions 

## Use uninformative priors

Tremblay et al (in prep) show that a dirichlet prior works on the columns of the transition matrix (**T**), and a gamma prior works for entries in the fecundity matrix **F**. 


### Transition matrix

So, let's add a uniform dirichlet prior with weight = $1$ to the transition matrix, $T$. Here, we have 4 fates (3 + death), so each fate adds 0.25 to the matrix of *observed* fates (not the transition matrix!). When we specify a prior matrix for the transitions, there is one more row than columns. This extra row represents death.

```{r posterior1}
Tprior <- matrix(0.25, byrow = TRUE, ncol = 3, nrow=4)
fill_transitions(TF, N, P = Tprior) # returns transition matrix by default
```

We can get the same result 'by hand' - we need the vector of observations because the posterior is calculated
from the observed transitions, not the transition matrix. 

```{r manualTransitionPost}
Tobs <- sweep(TF$T, 2, N, "*") # get the observed transitions
Tobs <- rbind(Tobs, N - colSums(Tobs)) # add the mortality row
Tobs <- Tobs + 0.25 # add the prior
sweep(Tobs, 2, colSums(Tobs), "/")[-4,] # divide by the column sum, and discard mortalityrow
```

The uniform prior fills in the missing transitions, but also creates problems because it provides transition values that are biologically impossible. For example, it provides a transition for adult->seedling, when this transition is only possible in the fecundity matrix $F$. For this reason
we do not recommend using uniform priors.


### Fertility matrix

We must specify the parameters for the fertility prior as a matrix. Entries representing stages that are non-reproducing or outcome stages that don't occur by reproduction should be `NA`. 

```{r}
alpha <- matrix(c(NA_real_, NA_real_, 1e-5,
                  NA_real_, NA_real_, NA_real_,
                  NA_real_, NA_real_, NA_real_), nrow=3, ncol = 3, byrow = TRUE)
beta <- matrix(c(NA_real_, NA_real_, 1e-5,
                  NA_real_, NA_real_, NA_real_,
                  NA_real_, NA_real_, NA_real_), nrow=3, ncol = 3, byrow = TRUE)
fill_fertility(TF, N, alpha = alpha, beta = beta)
```

The change in the fertility is < 0.0001 compared to the observed value. Calculating by hand, prior alpha is 
the number of observed offspring, and prior beta is the number of observed adults. 

```{r}
obs_offspring <- N[3]*TF$F[1,3] 
prior_alpha <- 1e-05
prior_beta <- 1e-05
posterior_alpha <- obs_offspring + prior_alpha
posterior_beta <- N[3] + prior_beta
posterior_alpha / posterior_beta # expected value
```

This demonstrates why the posterior point estimate of fecundity does not change much; the non-informative values for $\alpha$ and $\beta$ barely change the observed values. 

Now we can put them together.

```{r uninformativePrior}
unif <- list(T = fill_transitions(TF, N), 
             F = fill_fertility(TF, N, 
                                alpha = alpha,
                                beta = beta))
unif
```

The asymptotic population growth rate is now $\lambda =$ `r format(lambda(unif$T + unif$F),digits = 2)`. The growth rate shrinks slightly because applying the uniform prior to the transition probabilities causes the observed growth and survival transitions to shrink slightly relative to the unobserved transitions. 

### Other options for argument `returnType`

By default `fill_transitions()` returns the transition matrix $T$, and `fill_fertility()` returns the fertility matrix $F$. There are three other values that the argument `returnType` can take:

1. `fill_transitions(... returnType = "TN")` can return an augmented matrix of fates, which is useful for simulation. The fourth row of this result (see below) is the mortality state.   
```{r}
fill_transitions(TF, N, returnType = "TN")
```
2. `fill_fertility(... returnType = "ab")` returns the alpha and beta vectors of the posterior.  
```{r}
fill_fertility(TF, N, 
               alpha = alpha,
               beta = beta,
               returnType = "ab")
```
3. Both functions can also return the full matrix, the sum of $T$ and $F$.
```{r}
fill_transitions(TF, N, returnType = "A")
```

## Incorporate informative priors

To fix the problem of creating impossible transitions, we specify a more informative prior elicited from an expert on epiphytic orchids (RLT). It still has to have the same shape, one more row than columns.

```{r}
RLT_Tprior <- matrix(c(0.25, 0.025, 0.0,
                       0.05, 0.9,   0.025,
                       0.01, 0.025, 0.95,
                       0.69, 0.05,  0.025), 
                     byrow = TRUE, nrow = 4, ncol = 3)
```

Note that the 1st row, 3rd column is 0.0, because this transition is 
impossible. This prior is constructed so the columns sum to 1, which creates the greatest flexibility 
for weighting the prior.  By default the weight is 1, interpreted as a prior sample size of 1.

```{r}
fill_transitions(TF, N, P = RLT_Tprior)
```

We can specify the weight as a multiple of the sample size for each stage.

```{r}
fill_transitions(TF, N, P = RLT_Tprior, priorweight = 0.5)
```

Here the prior is weighted half as much as the observed number of transitions. In this case, 
with only 2 transitions the prior effective sample size is still 1. If the number of observed
transitions was larger, a prior weight of 0.5N would be larger than 1, but still allow the data to dominate.

# Part 3: Obtain Credible Intervals

## Obtain CI for indvidual matrix entries

The marginal posterior distribution of one element in a multinomial is a beta distribution, and
we use this to get credible intervals on our transition rates. We can use the 
TN return type to get the parameters of the desired multinomial.

```{r}
TN <- fill_transitions(TF, N, P = RLT_Tprior, priorweight = 0.5, returnType = "TN")
a <- TN[,1] # change 1 to 2, 3 etc to get marginal distribution of other columns
b <- sum(TN[,1]) - TN[,1]# change 1 to 2, 3 etc to get marginal distribution of other columns
p <- a / (a + b)
lcl <- qbeta(0.025, a, b)
ucl <- qbeta(0.975, a, b)
knitr::kable(sprintf("%.3f (%.3f, %.3f)", p, lcl, ucl))
```

Those are the point estimates (compare with first column above), lower and upper $95\%$ symmetric credible 
intervals for transitions out of the seedling stage. There is a high degree of uncertainty because of the small sample size ($2$), and low weight on the prior ($1$), leading to an effective sample size of 3. If we increase the effective sample size to $20$ by specifying: `priorweight`$= 9 (9*2 = 18 + 2 = 20)$ the symmetric credible intervals shrink quite a bit:

```{r, echo=FALSE}
TN <- fill_transitions(TF, N, P = RLT_Tprior, priorweight = 9, returnType = "TN")
a <- TN[,1]
b <- sum(TN[,1]) - TN[,1] # beta parameter is sum of all other dirichlet parameters.
p2 <- a / (a + b)
lcl2 <- qbeta(0.025, a, b)
ucl2 <- qbeta(0.975, a, b)
alltogethernow <- tibble(priorweight = factor(rep(c(1,9),each = 4)),
                         fate = factor(rep(c("s","j","a","dead"), times = 2), levels = c("s","j","a","dead")),
                         p = c(p, p2),
                         lcl = c(lcl, lcl2),
                         ucl = c(ucl, ucl2))

ggplot(data = alltogethernow,
       mapping = aes(x = fate, y = p, group= priorweight)) + 
  geom_point(mapping = aes(shape = priorweight), 
             position = position_dodge(width =0.5)) + 
  geom_errorbar(mapping = aes(ymin = lcl, ymax = ucl),
                position = position_dodge(width = 0.5)) + 
  rlt_theme
```

The transition rate from seedling to juvenile shrinks when the prior sample size is too large. 
In general the prior sample size should be less than the observed sample size. 

> **Interactive Exploration:** To visually test the effects of differing prior weights and explore posterior probability densities without writing code, you can launch the interactive Shiny app using `raretrans::run_app()`.

## Credible intervals on $\lambda$

Obtaining credible intervals on the asymptotic growth rate, $\lambda$, requires simulating matrices 
from the posterior distributions. This is somewhat complicated to do correctly, and we have written 
a function `raretrans::sim_transitions()` to generate a list of simulated matrices given the observed matrix and prior specifications.

```{r}
sim_transitions(TF, N, P = RLT_Tprior, alpha = alpha, beta = beta,
                priorweight = 0.5)
```

Now we simulate 1000 times, calculate the leading eigenvalue of each matrix, and draw a histogram.

```{r}
set.seed(8390278) # make this part reproducible
alpha2 <- matrix(c(NA_real_, NA_real_, 0.025,
                  NA_real_, NA_real_, NA_real_,
                  NA_real_, NA_real_, NA_real_), nrow=3, ncol = 3, byrow = TRUE)
beta2 <- matrix(c(NA_real_, NA_real_, 1,
                  NA_real_, NA_real_, NA_real_,
                  NA_real_, NA_real_, NA_real_), nrow=3, ncol = 3, byrow = TRUE)
# generate 1000 matrices based on the prior transitions and fertilities plus the data
RLT_0.5 <- sim_transitions(TF, N, P = RLT_Tprior, alpha = alpha2, beta = beta2,
                priorweight = 0.5, samples = 1000)
# extract the lambdas for each matrix
RLT_0.5 <- tibble(lposterior = map_dbl(RLT_0.5, lambda))

ggplot(data = RLT_0.5,
       mapping = aes(x = lposterior)) + 
  geom_histogram(binwidth = 0.02) + 
  rlt_theme
```

We can calculate some summary statistics too. `pincrease` is the probability that $\lambda > 1$.

```{r}
RLT_0.5_summary <- summarize(RLT_0.5,
                             medianL = median(lposterior),
                             meanL = mean(lposterior),
                             lcl = quantile(lposterior, probs = 0.025),
                             ucl = quantile(lposterior, probs = 0.975),
                             pincrease = sum(lposterior > 1.)/n())
knitr::kable(RLT_0.5_summary, digits = 2)
```


```{r}
cri <- transition_CrI(TF, N, stage_names = c("plantula", "juvenile", "adult"))
plot_transition_CrI(cri)
```