---
title: "Statistical Applications of taxodist"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Statistical Applications of taxodist}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  eval = FALSE
)
library(taxodist)
```

The `taxodist` distance matrix is a standard `dist` object and integrates
directly with the R ecosystem for multivariate analysis. This vignette shows
how to use it alongside `vegan` and `ape` for ecological and phylogenetic
analyses. All examples use a set of threatened mammals from Brazil spanning
five broad taxonomic groups.

---

## Example dataset: threatened mammals of Brazil

```{r taxa}
taxa_brazil <- c(
  "Priodontes", "Myrmecophaga", "Chrysocyon", "Tapirus", "Didelphis",
  "Leontopithecus", "Brachyteles",
  "Panthera", "Pteronura", "Puma",
  "Sotalia", "Pontoporia",
  "Trichechus", "Mazama", "Blastocerus"
)
```

The dataset covers xenarthrans (`Priodontes`, `Myrmecophaga`), a marsupial
(`Didelphis`), ungulates (`Tapirus`, `Mazama`, `Blastocerus`), primates
(`Leontopithecus`, `Brachyteles`), carnivores (`Chrysocyon`, `Panthera`,
`Pteronura`, `Puma`), cetaceans (`Sotalia`, `Pontoporia`), and a sirenian
(`Trichechus`).

The pairwise distance matrix is computed with a single call. Lineages are
cached after the first retrieval, so subsequent analyses on the same taxa
are instantaneous.

```{r matrix}
library(taxodist)
mat <- distance_matrix(taxa_brazil)
print(mat)
#>                Priodontes Myrmecophaga Chrysocyon    Tapirus  Didelphis
#> Myrmecophaga   0.01639344
#> Chrysocyon     0.01694915   0.01694915
#> Tapirus        0.01694915   0.01694915 0.01587302
#> Didelphis      0.01754386   0.01754386 0.01754386 0.01754386
#> Leontopithecus 0.01694915   0.01694915 0.01666667 0.01666667 0.01754386
#> Brachyteles    0.01694915   0.01694915 0.01666667 0.01666667 0.01754386
#> Panthera       0.01694915   0.01694915 0.01492537 0.01587302 0.01754386
#> Pteronura      0.01694915   0.01694915 0.01470588 0.01587302 0.01754386
#> Puma           0.01694915   0.01694915 0.01492537 0.01587302 0.01754386
#> Sotalia        0.01694915   0.01694915 0.01587302 0.01562500 0.01754386
#> Pontoporia     0.01694915   0.01694915 0.01587302 0.01562500 0.01754386
#> Trichechus     0.01666667   0.01666667 0.01694915 0.01694915 0.01754386
#> Mazama         0.01694915   0.01694915 0.01587302 0.01562500 0.01754386
#> Blastocerus    0.01694915   0.01694915 0.01587302 0.01562500 0.01754386
```

`Didelphis` shows the largest distances to all other taxa (0.01754),
consistent with the early divergence of marsupials from placental mammals.
Carnivores (`Panthera`, `Pteronura`, `Puma`) show smaller distances among
themselves, reflecting a more recent common ancestor within Carnivora.

---

## Phylogenetic tree with `ape`

`taxo_cluster()` performs hierarchical clustering on the distance matrix and
returns an object that `ape` can convert directly to a `phylo` tree for
publication-quality visualisation.

```{r ape, fig.width = 7, fig.height = 5}
library(ape)

cl   <- taxo_cluster(taxa_brazil)
tree <- ape::as.phylo(cl$hclust)

plot(tree,
     main      = "Threatened Mammals of Brazil",
     cex       = 0.85,
     tip.color = "gray20")
```

The `phylo` object is fully compatible with all `ape` tree manipulation and
plotting functions, including `nodelabels()`, `edgelabels()`, and export to
Newick format via `write.tree()`.

```{r ape-newick}
ape::write.tree(tree, file = "taxa_brazil.nwk")
```

---

## Taxonomic diversity with `vegan`

### Clarke and Warwick indices (`taxondive`)

`vegan::taxondive()` computes a family of taxonomic diversity indices
(Clarke and Warwick 1998, 2001) from a community matrix and a taxonomic
distance matrix. These indices capture not just species richness but how
distantly related the species in a community are: a community of closely
related species scores lower than one with the same richness but spanning
multiple orders.

The function expects a community matrix (sites x species) and the `dist`
object from `distance_matrix()`. Here we define three hypothetical
communities, each dominated by one broad taxonomic group.

```{r taxondive}
set.seed(123)
library(vegan)

comm <- matrix(c(
  1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,   # xenarthrans + marsupial
  0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0,   # primates + carnivores
  0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1    # cetaceans + sirenian + ungulates
), nrow = 3, byrow = TRUE)

colnames(comm) <- taxa_brazil
rownames(comm) <- c("community_A", "community_B", "community_C")

vegan::taxondive(comm, mat)
#>               Species      Delta       Delta*     Lambda+     Delta+     S Delta+
#> community_A  3.0000e+00  1.7160e-02  1.7160e-02  2.9410e-07  1.7160e-02   0.0515
#> community_B  5.0000e+00  1.5905e-02  1.5905e-02  8.8325e-07  1.5905e-02   0.0795
#> community_C  5.0000e+00  1.5750e-02  1.5750e-02  1.1834e-06  1.5750e-02   0.0788
#> Expected                -9.9265e-02  1.6544e-02              1.6457e-02         
```

The key indices are:

**Delta** ($\Delta$): mean taxonomic distance between all pairs of species in
the community.

**Delta*** ($\Delta^*$): mean pairwise distance excluding self-comparisons;
comparable across communities of different sizes.

**Delta+** ($\Delta^+$): mean pairwise distance weighted by species
abundances. For presence-absence data this equals $\Delta^*$.

**Lambda+** ($\Lambda^+$): variance in taxonomic distances, measuring how
evenly a community spans the taxonomic tree.

`community_A`, dominated by xenarthrans and a marsupial, scores the highest
$\Delta$ (0.01673), reflecting deeper divergences among its members.
`community_B` (primates and carnivores) scores the lowest (0.01591), as its
members share more recent common ancestors within Placentalia.

---

## Mantel test with `vegan`

The Mantel test assesses the correlation between two distance matrices by
permutation. A typical application is testing whether taxonomic distance
correlates with geographic distance, a signature of phylogeographic structure.

Here we use simulated coordinates for illustration. In practice, replace
`coords` with observed latitude and longitude values.

```{r mantel}
set.seed(42)
coords <- matrix(rnorm(30), ncol = 2)
rownames(coords) <- taxa_brazil
geo_dist <- dist(coords)

vegan::mantel(mat, geo_dist)
#> Mantel statistic based on Pearson's product-moment correlation
#>
#> Call:
#> vegan::mantel(xdis = mat, ydis = geo_dist)
#>
#> Mantel statistic r: -0.0569
#>       Significance: 0.653
#>
#> Upper quantiles of permutations (null model):
#>   90%   95% 97.5%   99% 
#> 0.188 0.237 0.272 0.336 
#> Permutation: free
#> Number of permutations: 999
```

The non-significant result (r = -0.27, p = 0.972) is expected with random
coordinates. For non-normal distance distributions, `"spearman"` is more
robust than the default `"pearson"`.

```{r mantel-spearman}
vegan::mantel(mat, geo_dist, method = "spearman", permutations = 9999)
#> Mantel statistic based on Spearman's rank correlation rho 
#>
#> Call:
#> vegan::mantel(xdis = mat, ydis = geo_dist, method = "spearman",
#>     permutations = 9999)
#>
#> Mantel statistic r: -0.07405
#>       Significance: 0.672
#>
#> Upper quantiles of permutations (null model):
#>   90%   95% 97.5%   99%
#> 0.189 0.244 0.293 0.353 
#> Permutation: free
#> Number of permutations: 9999
```

---

## PERMANOVA with `vegan`

PERMANOVA (Anderson 2001) tests whether groups of taxa differ significantly
in taxonomic distance space. It partitions total variance in the distance
matrix into within-group and between-group components and assesses
significance by permutation, with no assumption of multivariate normality.

```{r permanova}
groups <- c(
  "xenarthra", "xenarthra", "carnivore", "ungulate",  "marsupial",
  "primate",   "primate",
  "carnivore", "carnivore", "carnivore",
  "cetacean",  "cetacean",
  "sirenian",  "ungulate",  "ungulate"
)

vegan::adonis2(mat ~ groups)
#> Permutation test for adonis under reduced model
#> Permutation: free
#> Number of permutations: 999
#>
#> vegan::adonis2(formula = mat ~ groups)
#>          Df  SumOfSqs      R2      F Pr(>F)
#> Model     6 0.00100049 0.52646 1.4823  0.001 ***
#> Residual  8 0.00089992 0.47354  
#> Total    14 0.00190041 1.00000   
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

The taxonomic grouping explains $49.6\%$ of the total variance in the distance
matrix ($\text{R}^2 = 0.496$, $\text{p} = 0.001$). This confirms that the broad mammalian
orders occupy distinct regions of taxonomic distance space, consistent with
their deep evolutionary divergences.

For pairwise comparisons between groups after a significant PERMANOVA,
see `pairwise.adonis2()` in the `pairwiseAdonis` package.

---

## A complete workflow

The following script chains all steps from taxon names to multivariate
analysis.

```{r workflow}
library(taxodist)
library(vegan)
library(ape)

# 1. define taxa
taxa_brazil <- c(
  "Priodontes", "Myrmecophaga", "Chrysocyon", "Tapirus", "Didelphis",
  "Leontopithecus", "Brachyteles",
  "Panthera", "Pteronura", "Puma",
  "Sotalia", "Pontoporia",
  "Trichechus", "Mazama", "Blastocerus"
)

# 2. compute distance matrix
mat <- distance_matrix(taxa_brazil)

# 3. hierarchical clustering and phylo plot
cl   <- taxo_cluster(taxa_brazil)
tree <- ape::as.phylo(cl$hclust)
plot(tree, main = "Threatened Mammals of Brazil", cex = 0.85)

# 4. PCoA ordination
ord <- taxo_ordinate(mat)
plot(ord, main = "PCoA: Taxonomic Distance Space")

# 5. PERMANOVA
groups <- c(
  "xenarthra", "xenarthra", "carnivore", "ungulate",  "marsupial",
  "primate",   "primate",
  "carnivore", "carnivore", "carnivore",
  "cetacean",  "cetacean",
  "sirenian",  "ungulate",  "ungulate"
)
vegan::adonis2(mat ~ groups)
```

---

## References

Anderson, M.J. (2001). A new method for non-parametric multivariate
analysis of variance. *Austral Ecology*, 26, 32--46.

Clarke, K.R. and Warwick, R.M. (1998). A taxonomic distinctness index and
its statistical properties. *Journal of Applied Ecology*, 35, 523--531.

Clarke, K.R. and Warwick, R.M. (2001). A further biodiversity index
applicable to species lists: variation in taxonomic distinctness.
*Marine Ecology Progress Series*, 216, 265--278.

Brands, S.J. (1989 onwards). *Systema Naturae 2000*. Amsterdam,
The Netherlands. Retrieved from The Taxonomicon,
<http://taxonomicon.taxonomy.nl>.