Biostatistics vignette

Rob Knell

January 2021

The Biostatistics Package

This package consists of a series of learnr tutorials for use in teaching statistics to biologists. They were written for use in undergraduate and postgraduate teaching in the UK but they could also be used for individual, self-directed learning. The subjects covered range from basic data visualistion and description through to reasonably advanced linear modelling. There are obviously many subjects which are not currently covered such as generalised linear models, mixed effects models and multivariate statistics and it is hoped that these will be incorporated in the future.

There is a strong emphasis throughout the tutorial on analysing real data sets. This is much better for learning statistics than using synthetic example data because with real data comes all of the issues and uncertainty associated with real science. The data used here have mostly been made publicly available by the authors of papers published in the biological literature, mostly via the Dryad data repository, and I would like to thank all of them for this.

The tutorials are written for the learnr package which uses an rmarkdown framework to render tutorials into shiny webapps. The rmarkdown files for all of the tutorials are available on the author’s github page.

Running tutorials

There are two ways of running these tutorials. The easy way assumes you are using a recent version of RStudio. If this is the case then once you have installed the package the tutorials will show up in the ‘Tutorial’ tab in the RStudio pane that also includes the Environment and History tabs. Click the “Start Tutorial” button and the tutorial will render, which can take a few seconds, and then appear in the Tutorial tab. You’ll probably want to maximise the pane within your RStudio window. If you want to finish the tutorial click on the ‘Stop’ sign button at the top left of the tab.

If you would rather run your tutorial in a separate browser window then you can use the run_tutorial() function from the learnr package. You need to specify the name of the tutorial and the package, so

learnr::run_tutorial("02 Descriptive statistics", package = "Biostatistics")

will run the Descriptive Statistics tutorial and

learnr::run_tutorial("17 Linear models 5 Multiple Regression", package = "Biostatistics")

will run the Multiple Regression tutorial. In my experience the first method, with the Tutorial pane, seems more stable and sometimes tutorials won’t render using run_tutorial() for reasons that are not clear.

List of tutorials

The tutorials currently in the package are:

00 Introduction
01 Frequency histograms
02 Descriptive statistics
03 Boxplots
04 Scatterplots
05 Sampling_distributions
06 Standard_errors
07 Confidence intervals
08 Confidence intervals for the difference between two means 09 Paired-sample t-tests
10 Two sample t-tests
11 Chi-square tests
12 Correlation
13 Linear models 1 Single factor ANOVA
14 Linear models 2 Linear Regression
15 Linear models 3 Model assumptions and diagnostics
16 Linear models 4 Multi factor ANOVA
17 Linear models 5 Multiple regression
18 Linear models 6 Factors and continuous variables
19 Linear models 7 Model selection

Datasets used in this package