There are two prevaling methods for using HTS-SIP data to estimate the amount of isotope that each OTU incorporated:
In this vignette, we are going to show how to run both analyses and also compare the results a bit.
First, let's load some packages including HTSSIP
.
library(dplyr)
library(ggplot2)
library(HTSSIP)
OK. We're going to be using 2 data files:
We'll be using the dataset that we simulated in the HTSSIP_sim vignette.
The phyloseq object is similar to the dataset in the other vignettes.
# HTS-SIP data
physeq_rep3
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 6 taxa and 144 samples ]
## sample_data() Sample Data: [ 144 samples by 5 sample variables ]
The associated qPCR data is a list of length = 2.
# qPCR data (list object)
physeq_rep3_qPCR %>% names
## [1] "raw" "summary"
For the analyses in this vignette, we only need the 'summary' table.
# qPCR data (list object)
physeq_rep3_qPCR_sum = physeq_rep3_qPCR$summary
physeq_rep3_qPCR_sum %>% head(n=4)
## IS_CONTROL Sample Buoyant_density qPCR_tech_rep_mean
## 1 FALSE 13C-Glu_rep1_1.671712_2 1.671712 46598172
## 2 FALSE 13C-Glu_rep1_1.671722_1 1.671722 42299449
## 3 FALSE 13C-Glu_rep1_1.680311_3 1.680311 53536519
## 4 FALSE 13C-Glu_rep1_1.683540_4 1.683540 51449609
## qPCR_tech_rep_sd Gradient Fraction Treatment Replicate
## 1 9055833 13C-Glu_rep1 2 13C-Glu 1
## 2 16369298 13C-Glu_rep1 1 13C-Glu 1
## 3 18265667 13C-Glu_rep1 3 13C-Glu 1
## 4 3796576 13C-Glu_rep1 4 13C-Glu 1
OK. Let's quantify isotope incorporation witht the q-SIP method.
# transforming OTU counts
physeq_rep3_t = OTU_qPCR_trans(physeq_rep3, physeq_rep3_qPCR_sum)
# calculating atom fraction excess
atomX = qSIP_atom_excess(physeq_rep3_t,
control_expr='Treatment=="12C-Con"',
treatment_rep='Replicate')
atomX %>% names
## [1] "W" "A"
The resulting list object contains 2 data.frames. We are interested in the 'A' table, which contains estimated BD shifts (Z) and atom fraction excess (A).
atomX$A %>% head(n=4)
## # A tibble: 4 x 9
## OTU Wlab Wlight Z Gi Mlight Mheavymax Mlab A
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 OTU.1 1.72 1.70 0.0222 0.636 308. 318. 312. 0.412
## 2 OTU.2 1.71 1.69 0.0117 0.586 308. 318. 310. 0.218
## 3 OTU.3 1.73 1.70 0.0347 0.608 308. 318. 314. 0.644
## 4 OTU.4 1.73 1.70 0.0261 0.705 308. 318. 313. 0.484
Next, let's calculate bootstrap confidence intervales for the atom fraction excess estimations.
# calculating bootstrapped CI values
df_atomX_boot = qSIP_bootstrap(atomX, n_boot=100)
df_atomX_boot %>% head(n=4)
## # A tibble: 4 x 11
## OTU Wlab Wlight Z Gi Mlight Mheavymax Mlab A A_CI_low
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 OTU.1 1.72 1.70 0.0222 0.636 308. 318. 312. 0.412 0.0252
## 2 OTU.2 1.71 1.69 0.0117 0.586 308. 318. 310. 0.218 0.126
## 3 OTU.3 1.73 1.70 0.0347 0.608 308. 318. 314. 0.644 0.431
## 4 OTU.4 1.73 1.70 0.0261 0.705 308. 318. 313. 0.484 0.366
## # … with 1 more variable: A_CI_high <dbl>
Now for delta_BD. The setup is easier because we are not using qPCR data, just relative abundances from 16S rRNA sequence data.
df_dBD = delta_BD(physeq_rep3, control_expr='Treatment=="12C-Con"')
df_dBD %>% head(n=4)
## # A tibble: 4 x 4
## OTU CM_control CM_treatment delta_BD
## <chr> <dbl> <dbl> <dbl>
## 1 OTU.1 1.69 1.72 0.0260
## 2 OTU.2 1.69 1.70 0.0130
## 3 OTU.3 1.69 1.74 0.0562
## 4 OTU.4 1.71 1.73 0.0208
Let's plot the data and compare all of the results. First, let's join all of the data into one table for plotting. We'll also format it for plotting.
# checking & joining data
stopifnot(nrow(df_atomX_boot) == nrow(df_dBD))
df_j = dplyr::inner_join(df_atomX_boot, df_dBD, c('OTU'='OTU'))
stopifnot(nrow(df_atomX_boot) == nrow(df_j))
# formatting data for plotting
df_j = df_j %>%
dplyr::mutate(OTU = reorder(OTU, -delta_BD))
OK. Time to plot!
# plotting BD shift (Z)
ggplot(df_j, aes(OTU)) +
geom_point(aes(y=Z), color='blue') +
geom_point(aes(y=delta_BD), color='red') +
geom_hline(yintercept=0, linetype='dashed', alpha=0.5) +
labs(x='OTU', y='BD shift (Z)') +
theme_bw() +
theme(
axis.text.x = element_blank()
)
In the figure, red points are delta_BD and blue points are q-SIP. It's easy to see that delta_BD is a lot more variable than q-SIP. This is likely due to a high influence of compositional data artifacts on delta_BD versus q-SIP.
Let's make a boxplot to show the difference in estimation variance between the two methods.
# plotting BD shift (Z): boxplots
## formatting the table
df_j_g = df_j %>%
dplyr::select(OTU, Z, delta_BD) %>%
tidyr::gather(Method, BD_shift, Z, delta_BD) %>%
mutate(Method = ifelse(Method == 'Z', 'q-SIP', 'delta-BD'))
## plotting
ggplot(df_j_g, aes(Method, BD_shift)) +
geom_boxplot() +
geom_hline(yintercept=0, linetype='dashed', alpha=0.5) +
labs(x='Method', y='BD shift (Z)') +
theme_bw()
The boxplot helps to summarize how much more variance delta_BD produces versus q-SIP.
sessionInfo()
## R version 3.4.3 (2017-11-30)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Sierra 10.12.6
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
##
## locale:
## [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] phyloseq_1.22.3 HTSSIP_1.4.1 ggplot2_3.2.0 tidyr_0.8.3
## [5] dplyr_0.8.0.1
##
## loaded via a namespace (and not attached):
## [1] nlme_3.1-131 bitops_1.0-6
## [3] matrixStats_0.54.0 bit64_0.9-7
## [5] doParallel_1.0.14 RColorBrewer_1.1-2
## [7] GenomeInfoDb_1.14.0 tools_3.4.3
## [9] backports_1.1.4 utf8_1.1.4
## [11] R6_2.4.0 coenocliner_0.2-2
## [13] vegan_2.5-1 rpart_4.1-15
## [15] Hmisc_4.2-0 DBI_1.0.0
## [17] lazyeval_0.2.2 BiocGenerics_0.24.0
## [19] mgcv_1.8-28 colorspace_1.4-1
## [21] permute_0.9-5 ade4_1.7-13
## [23] nnet_7.3-12 withr_2.1.2
## [25] tidyselect_0.2.5 gridExtra_2.3
## [27] DESeq2_1.18.1 bit_1.1-14
## [29] compiler_3.4.3 cli_1.1.0
## [31] Biobase_2.38.0 htmlTable_1.13.1
## [33] DelayedArray_0.4.1 labeling_0.3
## [35] scales_1.0.0 checkmate_1.9.3
## [37] genefilter_1.60.0 stringr_1.4.0
## [39] digest_0.6.19 foreign_0.8-71
## [41] XVector_0.18.0 base64enc_0.1-3
## [43] pkgconfig_2.0.2 htmltools_0.3.6
## [45] highr_0.8 htmlwidgets_1.3
## [47] rlang_0.4.0 RSQLite_2.1.1
## [49] rstudioapi_0.10 jsonlite_1.6
## [51] BiocParallel_1.12.0 acepack_1.4.1
## [53] RCurl_1.95-4.12 magrittr_1.5
## [55] GenomeInfoDbData_1.0.0 Formula_1.2-3
## [57] biomformat_1.6.0 Matrix_1.2-17
## [59] fansi_0.4.0 Rcpp_1.0.1
## [61] munsell_0.5.0 S4Vectors_0.16.0
## [63] ape_5.3 stringi_1.4.3
## [65] MASS_7.3-51.4 SummarizedExperiment_1.8.1
## [67] zlibbioc_1.24.0 rhdf5_2.22.0
## [69] plyr_1.8.4 blob_1.1.1
## [71] grid_3.4.3 parallel_3.4.3
## [73] crayon_1.3.4 lattice_0.20-38
## [75] Biostrings_2.46.0 splines_3.4.3
## [77] multtest_2.34.0 annotate_1.56.2
## [79] locfit_1.5-9.1 zeallot_0.1.0
## [81] knitr_1.18 pillar_1.4.1
## [83] igraph_1.2.4 GenomicRanges_1.30.3
## [85] markdown_0.9 geneplotter_1.56.0
## [87] reshape2_1.4.3 codetools_0.2-16
## [89] stats4_3.4.3 XML_3.98-1.19
## [91] glue_1.3.1 evaluate_0.14
## [93] latticeExtra_0.6-28 data.table_1.10.4-3
## [95] vctrs_0.1.0 foreach_1.4.4
## [97] gtable_0.3.0 purrr_0.3.2
## [99] assertthat_0.2.1 mime_0.6
## [101] xtable_1.8-4 survival_2.44-1.1
## [103] tibble_2.1.1 iterators_1.0.10
## [105] memoise_1.1.0 AnnotationDbi_1.40.0
## [107] IRanges_2.12.0 cluster_2.0.6