How the workflowr package works

workflowr version 1.7.1

John Blischak

2023-08-22

The workflowr package combines many powerful tools in order to produce a research website. It is absolutely not necessary to understand all the underlying tools to take advantage of workflowr, and in fact that is one of the primary goals of workflowr: to allow researchers to focus on their analyses without having to worry too much about the technical details. However, if you are interested in implementing advanced customization options, contributing to workflowr, or simply want to learn more about these tools, the sections below provide some explanations of how workflowr works.

Overview

R is the computer programming language used to perform the analysis. knitr is an R package that executes code chunks in an R Markdown file to create a Markdown file. Markdown is a lightweight markup language that is easier to read and write than HTML. rmarkdown is an R package that combines the functionality of knitr and the document converter pandoc. Pandoc powers the conversion of knitr-produced Markdown files into HTML, Word, or PDF documents. Additionally, newer versions of rmarkdown contain functions for building websites. The styling of the websites is performed by the web framework Bootstrap. Bootstrap implements the navigation bar at the top of the website, has many available themes to customize the look of the site, and dynamically adjusts the website so it can be viewed on a desktop, tablet, or mobile device. The rmarkdown website configuration file _site.yml allows convenient customization of the Bootstrap navigation bar and theme.

Git is a distributed version control system (VCS) that tracks code development. It has many powerful features, but only a handful of the main functions are required to use workflowr. git2r is an R package which provides an interface to libgit2, which is a portable, pure C implementation of the Git core methods (this is why you don’t need to install Git before using workflowr). GitHub is a website that hosts Git repositories and additionally provides collaboration tools for developing software. GitHub Pages is a GitHub service that offers free hosting of static websites. By placing the HTML files for the website in the subdirectory docs/, GitHub Pages serves them online.

To aid reproducibility, workflowr provides an R Markdown output format wflow_html() template that automatically sets a seed for random number generation, records the session information, and reports the status of the Git repository (so you always know which version of the code produced the results contained in that particular file). These options are controlled by the settings in _workflowr.yml. It also provides a custom site generator wflow_site() that enables wflow_html() to work with R Markdown websites. These options are controlled in analysis/_site.yml.

Where are the figures?

workflowr saves the figures into an organized, hierarchical directory structure within analysis/. For example, the first figure generated by the chunk named plot-data in the file filename.Rmd will be saved as analysis/figure/filename.Rmd/plot-data-1.png. Furthermore, the figure files are moved to docs/ when render_site is run (this is the rmarkdown package function called by wflow_build, wflow_publish, and the RStudio Knit button).

The figures have to be committed to the Git repository in docs/ in order to be displayed properly on the website. wflow_publish automatically commits the figures in docs corresponding to new or updated R Markdown files, and analysis/figure/ is in the .gitignore file to prevent accidentally committing duplicate files.

Because workflowr requires the figures to be saved to a specific location in order to function properly, it will override any custom setting of the knitr option fig.path (which controls where figure files are saved) and insert a warning into the HTML file to alert the user that their value for fig.path was ignored.

Additional tools

Posit Software, PBC is a company that develops open source software for R users. They are the principal developers of RStudio, an integrated development environment (IDE) for R, and the rmarkdown package. Because of this tight integration, new developments in the rmarkdown package are quickly incorporated into the RStudio IDE. While not strictly required for using workflowr, using RStudio provides many benefits, including:

Another key R package used by workflowr is rprojroot. This package finds the root of the repository, so workflowr functions like wflow_build will work the same regardless of the current working directory. Specifically, rprojroot searches for the RStudio project .Rproj file at the base of the workflowr project (so don’t delete it!).

Further reading