Models of nucleotide substitution

Introduction

This document outlines the models of substitution used in the package. The matrices below are substitution-rate matrices for each model. The rates within these matrices are ordered as follows:

\[ \begin{bmatrix} \cdot & T\rightarrow C & T\rightarrow A & T\rightarrow G \\ C\rightarrow T & \cdot & C\rightarrow A & C\rightarrow G \\ A\rightarrow T & A\rightarrow C & \cdot & A\rightarrow G \\ G\rightarrow T & G\rightarrow C & G\rightarrow A & \cdot \end{bmatrix} \]

(For example, \(C \rightarrow T\) indicates that the cell in that location refers to the rate from \(C\) to \(T\).) Diagonals are determined based on all rows having to sum to zero (Yang 2006).

Under each rate matrix are listed the parameters in the function required for that model.

Below is a key of the parameters required in the functions for the models below, in order of their appearance:

Functions in jackalope that employ each model take the form sub_X for model X (e.g., sub_JC69 for JC69 model).

Note: In all models, the matrices are scaled such that the overall mutation rate is 1, but this behavior can be change using the mu parameter for each function.

JC69

The JC69 model (Jukes and Cantor 1969) uses a single rate, \(\lambda\).

\[ \mathbf{Q} = \begin{bmatrix} \cdot & \lambda & \lambda & \lambda \\ \lambda & \cdot & \lambda & \lambda \\ \lambda & \lambda & \cdot & \lambda \\ \lambda & \lambda & \lambda & \cdot \end{bmatrix} \]

Parameters:

K80

The K80 model (Kimura 1980) uses separate rates for transitions (\(\alpha\)) and transversions (\(\beta\)).

\[ \mathbf{Q} = \begin{bmatrix} \cdot & \alpha & \beta & \beta \\ \alpha & \cdot & \beta & \beta \\ \beta & \beta & \cdot & \alpha \\ \beta & \beta & \alpha & \cdot \end{bmatrix} \]

Parameters:

F81

The F81 model (Felsenstein 1981) incorporates different equilibrium frequency distributions for each nucleotide (\(\pi_X\) for nucleotide \(X\)).

\[ \mathbf{Q} = \begin{bmatrix} \cdot & \pi_C & \pi_A & \pi_G \\ \pi_T & \cdot & \pi_A & \pi_G \\ \pi_T & \pi_C & \cdot & \pi_G \\ \pi_T & \pi_C & \pi_A & \cdot \end{bmatrix} \]

Parameters:

HKY85

The HKY85 model (Hasegawa et al. 1984, 1985) combines different equilibrium frequency distributions with unequal transition and transversion rates.

\[ \mathbf{Q} = \begin{bmatrix} \cdot & \alpha \pi_C & \beta \pi_A & \beta \pi_G \\ \alpha \pi_T & \cdot & \beta \pi_A & \beta \pi_G \\ \beta \pi_T & \beta \pi_C & \cdot & \alpha \pi_G \\ \beta \pi_T & \beta \pi_C & \alpha \pi_A & \cdot \end{bmatrix} \]

Parameters:

TN93

The TN93 model (Tamura and Nei 1993) adds to the HKY85 model by distinguishing between the two types of transitions: between pyrimidines (\(\alpha_1\)) and between purines (\(\alpha_2\)).

\[ \mathbf{Q} = \begin{bmatrix} \cdot & \alpha_1 \pi_C & \beta \pi_A & \beta \pi_G \\ \alpha_1 \pi_T & \cdot & \beta \pi_A & \beta \pi_G \\ \beta \pi_T & \beta \pi_C & \cdot & \alpha_2 \pi_G \\ \beta \pi_T & \beta \pi_C & \alpha_2 \pi_A & \cdot \end{bmatrix} \]

Parameters:

F84

The F84 model (Kishino and Hasegawa 1989) is a special case of TN93, where \(\alpha_1 = (1 + \kappa/\pi_Y) \beta\) and \(\alpha_2 = (1 + \kappa/\pi_R) \beta\) (\(\pi_Y = \pi_T + \pi_C\) and \(\pi_R = \pi_A + \pi_G\)).

\[ \mathbf{Q} = \begin{bmatrix} \cdot & (1 + \kappa/\pi_Y) \beta \pi_C & \beta \pi_A & \beta \pi_G \\ (1 + \kappa/\pi_Y) \beta \pi_T & \cdot & \beta \pi_A & \beta \pi_G \\ \beta \pi_T & \beta \pi_C & \cdot & (1 + \kappa/\pi_R) \beta \pi_G \\ \beta \pi_T & \beta \pi_C & (1 + \kappa/\pi_R) \beta \pi_A & \cdot \end{bmatrix} \]

Parameters:

GTR

The GTR model (Tavaré 1986) is the least restrictive model that is still time-reversible (i.e., the rates \(r_{x \rightarrow y} = r_{y \rightarrow x}\)).

\[ \mathbf{Q} = \begin{bmatrix} \cdot & a \pi_C & b \pi_A & c \pi_G \\ a \pi_T & \cdot & d \pi_A & e \pi_G \\ b \pi_T & d \pi_C & \cdot & f \pi_G \\ c \pi_T & e \pi_C & f \pi_A & \cdot \end{bmatrix} \]

Parameters:

UNREST

The UNREST model (Yang 1994) is entirely unrestrained.

\[ \mathbf{Q} = \begin{bmatrix} \cdot & q_{TC} & q_{TA} & q_{TG} \\ q_{CT} & \cdot & q_{CA} & q_{CG} \\ q_{AT} & q_{AC} & \cdot & q_{AG} \\ q_{GT} & q_{GC} & q_{GA} & \cdot \end{bmatrix} \]

Parameters:

References

Felsenstein, J. 1981. Evolutionary trees from DNA sequences: A maximum likelihood approach. Journal of Molecular Evolution 17:368–376.

Hasegawa, M., H. Kishino, and T. Yano. 1985. Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. Journal of Molecular Evolution 22:160–174.

Hasegawa, M., T. Yano, and H. Kishino. 1984. A new molecular clock of mitochondrial DNA and the evolution of hominoids. Proceedings of the Japan Academy, Series B 60:95–98.

Jukes, T. H., and C. R. Cantor. 1969. Evolution of protein molecules. Pages 21–131 in H. N. Munro, editor. Mammalian protein metabolism. Academic Press, New York.

Kimura, M. 1980. A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. Journal of Molecular Evolution 16:111–120.

Kishino, H., and M. Hasegawa. 1989. Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from DNA sequence data, and the branching order in hominoidea. Journal of Molecular Evolution 29:170-179.

Tamura, K., and M. Nei. 1993. Estimation of the number of nucleotide substitutions in the control region of mitochondrial dna in humans and chimpanzees. Molecular Biology and Evolution 10:512–526.

Tavaré, S. 1986. Some probabilistic and statistical problems in the analysis of DNA sequences. Lectures on Mathematics in the Life Sciences 17:57–86.

Yang, Z. B. 1994. Estimating the pattern of nucleotide substitution. Journal of Molecular Evolution 39:105–111.

Yang, Z. 2006. Computational molecular evolution. (P. H. Harvey and R. M. May, Eds.). Oxford University Press, New York, NY, USA.