The achieveGap package provides a statistically rigorous framework for estimating achievement gap trajectories in longitudinal educational data. Rather than computing the gap as a post hoc difference between two separately estimated curves, the gap function is parameterized directly as a smooth estimand within a joint hierarchical model. This ensures correct uncertainty quantification via simultaneous confidence bands.
The package is motivated by the observation that achievement gaps often evolve nonlinearly across grades — widening rapidly in early elementary school, plateauing in middle grades, or narrowing following policy interventions — patterns that standard linear growth models cannot capture.
# From CRAN (once published):
install.packages("achieveGap")
# From GitHub (development version):
# install.packages("devtools")
devtools::install_github("yourusername/achieveGap")The simplest way to get started is with simulate_gap(),
which generates synthetic longitudinal data with a known true gap
function.
library(achieveGap)
# Generate data: 400 students in 30 schools, grades K-7
# Gap shape: monotone widening (true gap increases steadily across grades)
sim <- simulate_gap(
n_students = 400,
n_schools = 30,
gap_shape = "monotone",
seed = 2024
)
# Preview the data
head(sim$data)
#> student grade school SES_group score
#> 1 1 0 2 1 0.2109525
#> 401 1 1 2 1 0.7041129
#> 801 1 2 2 1 0.7189772
#> 1201 1 3 2 1 0.6504944
#> 1601 1 4 2 1 0.7958809
#> 2001 1 5 2 1 1.4512534# The true gap at each grade
sim$true_gap
#> grade gap
#> 1 0 0.000
#> 2 1 0.075
#> 3 2 0.160
#> 4 3 0.255
#> 5 4 0.360
#> 6 5 0.475
#> 7 6 0.600
#> 8 7 0.735The data frame has one row per student-grade observation, with
columns student, grade, school,
SES_group (0 = high SES, 1 = low SES), and
score.
The main function is gap_trajectory(). You specify the
column names for the outcome, grade, group indicator, school ID, and
student ID.
# Formula interface (recommended) — same as lme4/nlme style
fit <- achieve_gap(
score ~ grade,
group = "SES_group",
random = ~ 1 | school/student,
data = sim$data,
k = 6, # spline basis dimension (must be < unique grade values)
n_sim = 5000 # posterior draws for simultaneous bands
)Printing the fitted object gives a concise overview:
print(fit)
#>
#> Achievement Gap Trajectory Model
#> ----------------------------------------
#> Group variable : SES_group
#> Group levels : ref=0 | focal=1
#> Grade variable : grade
#> Score variable : score
#> Observations : 3200
#> Students : 400
#> Schools : 30
#> Conf. level : 95%
#> Simult. crit. : 2.506
#>
#> Use summary() for estimates and plot() for visualization.summary() displays estimated gap values with standard
errors and simultaneous confidence band bounds at equally spaced grade
points. Grades marked with * have bands that exclude zero —
statistically significant gap with multiplicity control.
summary(fit)
#>
#> Achievement Gap Trajectory -- Summary
#> ========================================================================
#> Group: SES_group | Grade: grade | Score: score
#> N = 400 students in 30 schools (3200 observations)
#> ------------------------------------------------------------------------
#> Grade Gap Est. Std. Err. Sim. Lower Sim. Upper Sig.*
#> 0.00 0.029 0.048 -0.091 0.149
#> 0.99 0.110 0.040 0.010 0.210 *
#> 1.98 0.197 0.038 0.102 0.292 *
#> 2.97 0.291 0.038 0.196 0.386 *
#> 4.03 0.399 0.038 0.304 0.494 *
#> 5.02 0.508 0.038 0.413 0.603 *
#> 6.01 0.624 0.040 0.524 0.724 *
#> 7.00 0.742 0.048 0.622 0.862 *
#> ------------------------------------------------------------------------
#> Gap range: [0.029, 0.742]
#> Significant span (simultaneous 95% CB): 0.92 to 7.00
#> Simultaneous critical value: 2.506
#> * Sig. indicates simultaneous band excludes zero.plot() produces a publication-ready figure. By default,
both the simultaneous band (light shading) and pointwise band (dark
shading) are shown. In simulation settings where the true gap is known,
you can overlay it with the true_gap argument.
# Grade labels for x-axis
grade_labs <- c("K", "G1", "G2", "G3", "G4", "G5", "G6", "G7")
# True gap evaluated at the model's grade grid
true_gap_vec <- sim$f1_fun(fit$grade_grid)
plot(
fit,
true_gap = true_gap_vec,
grade_labels = grade_labs,
title = "SES Achievement Gap Trajectory (Simulated Data)"
)To show only the simultaneous band:
test_gap() runs two types of tests:
mgcv.)tryCatch(
test_gap(fit, type = "both"),
error = function(e) message("test_gap: ", conditionMessage(e))
)
#>
#> Hypothesis Tests for Gap Trajectory
#> ------------------------------------------
#> Global test (approx Wald): ChiSq(6)=289.203, p=1.681e-59 -> REJECT H0
#> Simultaneous band significance: YES
#> Significant intervals (simultaneous band excludes 0):
#> start end direction
#> 0.9191919 7 positive
#> $type
#> [1] "both"
#>
#> $alpha
#> [1] 0.05
#>
#> $global
#> $global$stat
#> [1] 289.2031
#>
#> $global$df
#> [1] 6
#>
#> $global$p_value
#> [1] 1.681396e-59
#>
#> $global$reject
#> [1] TRUE
#>
#>
#> $simultaneous
#> $simultaneous$any_significant
#> [1] TRUE
#>
#> $simultaneous$intervals
#> start end direction
#> 1 0.9191919 7 positive
#>
#>
#> attr(,"class")
#> [1] "achieveGap_test"fit_separate() provides the comparison approach: two
independent spline models, one per group, with the gap computed by post
hoc subtraction. As discussed in the paper, this approach underestimates
uncertainty.
sep <- fit_separate(
data = sim$data,
score = "score",
grade = "grade",
group = "SES_group",
school = "school",
student = "student"
)
# Compare gap estimates side by side
cat("Joint model gap at grade 4: ", round(fit$gap_hat[fit$grade_grid >= 3.9][1], 3), "\n")
#> Joint model gap at grade 4: 0.392
cat("Separate model gap at grade 4:", round(sep$gap_hat[sep$grade_grid >= 3.9][1], 3), "\n")
#> Separate model gap at grade 4: 0.388
# Separate model CIs are narrower (anti-conservative)
cat("\nMean CI width - Joint (pointwise):", round(mean(fit$pw_upper - fit$pw_lower), 3))
#>
#> Mean CI width - Joint (pointwise): 0.156
cat("\nMean CI width - Separate: ", round(mean(sep$ci_upper - sep$ci_lower), 3), "\n")
#>
#> Mean CI width - Separate: 0.205The nonmonotone gap shape widens early, plateaus, and
narrows slightly — a pattern linear models cannot capture.
sim2 <- simulate_gap(n_students = 400, n_schools = 30,
gap_shape = "nonmonotone", seed = 99)
fit2 <- gap_trajectory(
data = sim2$data,
score = "score",
grade = "grade",
group = "SES_group",
school = "school",
student = "student",
n_sim = 1000,
verbose = FALSE
)
plot(fit2,
true_gap = sim2$f1_fun(fit2$grade_grid),
grade_labels = grade_labs,
title = "Non-Monotone SES Gap: Widens Early, Plateaus, Narrows")run_simulation() replicates the simulation study from
the paper. Use n_reps = 5 for a quick test; the paper used
n_reps = 500.
results <- run_simulation(
n_reps = 5, # increase to 500 for paper results
n_sim = 1000 # increase to 5000 for final run
)To apply gap_trajectory() to real data, prepare a
long-format data frame with columns for:
# Example with ECLS-K:2011 data (after loading and preparing)
fit_real <- achieve_gap(
math_score_z ~ grade_numeric,
group = "low_ses",
random = ~ 1 | school_id/child_id,
data = eclsk_clean,
k = 6,
n_sim = 10000
)
summary(fit_real)
plot(fit_real, grade_labels = c("K", "G1", "G2", "G3", "G4", "G5"))
test_gap(fit_real)sessionInfo()
#> R version 4.5.1 (2025-06-13 ucrt)
#> Platform: x86_64-w64-mingw32/x64
#> Running under: Windows 11 x64 (build 26200)
#>
#> Matrix products: default
#> LAPACK version 3.12.1
#>
#> locale:
#> [1] LC_COLLATE=C
#> [2] LC_CTYPE=English_United States.utf8
#> [3] LC_MONETARY=English_United States.utf8
#> [4] LC_NUMERIC=C
#> [5] LC_TIME=English_United States.utf8
#>
#> time zone: America/Chicago
#> tzcode source: internal
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] achieveGap_0.1.0
#>
#> loaded via a namespace (and not attached):
#> [1] Matrix_1.7-4 gtable_0.3.6 jsonlite_2.0.0 dplyr_1.2.0
#> [5] compiler_4.5.1 Rcpp_1.1.0 tidyselect_1.2.1 jquerylib_0.1.4
#> [9] splines_4.5.1 scales_1.4.0 boot_1.3-31 yaml_2.3.10
#> [13] fastmap_1.2.0 lattice_0.22-7 ggplot2_4.0.2 R6_2.6.1
#> [17] labeling_0.4.3 generics_0.1.4 knitr_1.50 rbibutils_2.3
#> [21] MASS_7.3-65 tibble_3.3.0 nloptr_2.2.1 minqa_1.2.8
#> [25] bslib_0.9.0 pillar_1.11.1 RColorBrewer_1.1-3 rlang_1.1.7
#> [29] cachem_1.1.0 xfun_0.53 sass_0.4.10 S7_0.2.0
#> [33] cli_3.6.5 withr_3.0.2 mgcv_1.9-4 magrittr_2.0.4
#> [37] Rdpack_2.6.4 digest_0.6.37 grid_4.5.1 rstudioapi_0.17.1
#> [41] lme4_2.0-1 lifecycle_1.0.5 nlme_3.1-168 reformulas_0.4.4
#> [45] vctrs_0.7.1 evaluate_1.0.5 glue_1.8.0 farver_2.1.2
#> [49] codetools_0.2-20 rmarkdown_2.30 tools_4.5.1 pkgconfig_2.0.3
#> [53] htmltools_0.5.8.1