The longitudinal microbiome compositional data are highly skewed, bounded in [0,1), and often sparse with many zeros. In addition, the observations from repeated measures are correlated. We propose a two-part zero-inflated Beta regression model with random effects (ZIBR) for testing the association between microbial abundance and clinical covariates for longitudinal microbiome data. The model includes a logistic component to model presence/absence of the microbe in samples and a Beta component to model non-zero microbial abundance and each component includes a random effect to take into account the correlation among repeated measurements on the same subject.
The details of the statistical model are as follows:
The ZIBR model combines the logistic regression and Beta regression in one model. Each regression part includes random effects to account for correlations acorss time points. We call these two regressions in ZIBR model as logistic component and Beta component. These two components model two different aspects of the data. The logistic component models presence/absence of the microbe and Beta component models non-zero microbial abundance.
Accordingly, we can test three biologically relevant null
hypotheses:
- H0: α_j = 0. This is to test the coefficients in the logistic
component, if the covariates are associated with the bacterial taxon by
affecting its presence or absence;
- H0: β_j = 0. This is to test the coefficients in the Beta component,
if the taxon is associated with the covariates by showing different
abundances;
- H0: α_j = 0 and β_j = 0 for each covariate X_j and Z_j. This is to
joinly test the coefficients in both logistic and Beta components, if
the covariates affect the taxon both in terms of presence/absence and
its non-zero abundance.
You can install our ZIBR package from CRAN
install.packages("ZIBR")
Or get the dev version from GitHub
#install.packages("devtools")
::install_github("PennChopMicrobiomeProgram/ZIBR")
devtoolslibrary(ZIBR)
<- zibr(logistic_cov=logistic_cov,beta_cov=beta_cov,Y=Y,subject_ind=subject_ind,time_ind=time_ind) zibr_fit
The ordering of the samples in the above matrix or vectors must be consistent.
The zibr function will return the following results
zibr_fit
If there are missing values in certain time points, they can be imputed as following: 1. Calculate the mean or median of values from previous time point(s) and later time points(s). Use such values to replace the missing values. 2. Group the time point with missing values with other time points. For example, if you have T1, T2, T3 and T4 and T1 has missing values, you can group T1 and T2 as one time point.
After the missing values are imputed, the data can be fed into ZIBR.
If you have other problems with the package or features/fixes to suggest, please open an issue on the GitHub issues page.
Eric Z. Chen and Hongzhe Li (2016). A two-part mixed effect model for analyzing longitudinal microbiome data. Bioinformatics. Link
Maintained by Charlie Bushman (ctbushman@gmail.com) and the Penn CHOP Microbiome Program.