autoharp is an R package for semi-automatic
grading of R and R Markdown (Rmd/qmd) scripts. It was designed
at the National University of Singapore to handle the practical
challenges of grading programming assignments at scale:
for loop?”autoharp achieves this through four complementary
layers:
| Layer | What it Checks |
|---|---|
| Output correctness | Objects match the reference solution (typed, tolerance-aware) |
| Static code analysis | AST structure: e.g., no for loops, correct function
signature |
| Runtime profiling | Execution time and peak memory usage per submission |
| Code style (lint) | lintr-based style violations count |
The four-phase autoharp grading workflow: Prepare → Distribute → Grade → Review
The typical autoharp workflow has four phases:
autoharp.objs/autoharp.scalars chunk
options)populate_soln_env() then
render_one() per student; each runs in a sandboxed
R processYou can install autoharp from CRAN with:
You can also install the development version from GitHub:
Then load the package:
The heart of autoharp is the solution
template - an R Markdown file that does two things
simultaneously:
Two special chunk options mark what autoharp should
extract and test:
| Chunk Option | Purpose |
|---|---|
autoharp.objs |
Lists object names to extract from this chunk and save with a dot
prefix (e.g., X → .X) for later comparison
against student objects |
autoharp.scalars |
Marks test code that produces TRUE/FALSE
scalar results; these are the correctness tests students must pass |
Why Rmd? Grading complete documents (rather than isolated snippets) ensures students practice good scientific computing habits: their entire analysis must render cleanly.
Solution Template (.Rmd)
│
▼
populate_soln_env() ──► Solution Environment + Test Script
│
▼
render_one(student.Rmd) ──► Grading Results Data Frame
│
▼
log_summary() ──► Summary Report
Each render_one() call: 1. Launches a fresh,
sandboxed R process (via
parallel::makePSOCKcluster) 2. Checks for forbidden calls
(system(), setwd(), etc.) 3. Knits the
student’s Rmd with autoharp hooks active 4. Runs the test
script in the student’s environment 5. Returns a data frame with status,
runtime, memory, and test results
Suppose you assign students the following problem:
Write a function
rf(n)that generatesnrandom variates from the density \(f(x) = 4x^3\), \(0 < x < 1\). Use the inverse transform method. Then create a vectorXof 10,000 variates usingrf().
Create solution_template.Rmd:
---
title: "Solution Template"
output: html_document
---
``` r
# Reference solution: saved as .rf in the solution environment
rf <- function(n) {
u <- runif(n)
u^(1/4) # inverse CDF of f(x) = 4x^3
}
```
``` r
set.seed(2022)
X <- rf(10000) # saved as .X
```
``` r
# Each line produces TRUE/FALSE: these become the student's test results
length(formals(rf)) == 1 # rf has exactly 1 argument
length(X) == 10000 # X has 10,000 elements
abs(mean(X) - 0.8) < 0.02 # Mean close to 0.8 (theoretical: 0.8)
abs(sd(X) - 0.1633) < 0.02 # SD close to 0.163 (theoretical)
```Suppose a student submits student01.Rmd:
result <- render_one(
rmd_name = "student01.Rmd",
soln_env = soln$soln_env,
test_file = soln$test_file,
out_dir = "output/"
)
# The result is a one-row data frame
print(result)The output data frame contains:
| Column | Description |
|---|---|
file |
Student’s filename |
status |
"success", "timeout",
"error", or "precheck_fail" |
runtime |
Total execution time (seconds) |
mem_usage |
Peak memory usage (MB) |
test_1 … test_n |
Result of each correctness test
(TRUE/FALSE/NA) |
n_lints |
Number of lintr style violations |
render_success |
Did the Rmd render without errors? |
# Grade all students in a directory
student_files <- list.files("submissions/", pattern = "\\.Rmd$", full.names = TRUE)
results_list <- lapply(student_files, function(f) {
render_one(f, soln_env = soln$soln_env, test_file = soln$test_file,
out_dir = "output/")
})
all_results <- do.call(rbind, results_list)
# Print a summary table (pass rates, runtime distribution, etc.)
log_summary(all_results)autoharp integrates with the lintr package
to count style violations:
# Count lint violations in a single script
lint_count <- count_lints_one("student01.R")
# Count across all submissions
all_lints <- count_lints_all(
files = list.files("submissions/", pattern = "\\.R$", full.names = TRUE)
)
print(all_lints)Lint violations are included in the render_one() output
automatically, so you don’t need to call these separately if you’re
already running the full pipeline.
For R Markdown submissions, verify that required sections and chunks are present:
# Check that the submitted Rmd has the required sections
rmd_check <- check_rmd(
rmd_name = "student01.Rmd",
expected_sections = c("Introduction", "Analysis", "Conclusion")
)
print(rmd_check)For large classes, the Grading App provides a browser-based interface that wraps the entire workflow:
The Grading App has five tabs:
render_one() for all
submissions with progress trackingSee the Shiny Apps Guide for full details.
for loops, check function signatures, and more