Getting Started with achieveGap

Overview

The achieveGap package provides a statistically rigorous framework for estimating achievement gap trajectories in longitudinal educational data. Rather than computing the gap as a post hoc difference between two separately estimated curves, the gap function is parameterized directly as a smooth estimand within a joint hierarchical model. This ensures correct uncertainty quantification via simultaneous confidence bands.

The package is motivated by the observation that achievement gaps often evolve nonlinearly across grades — widening rapidly in early elementary school, plateauing in middle grades, or narrowing following policy interventions — patterns that standard linear growth models cannot capture.

Installation

# From CRAN (once published):
install.packages("achieveGap")

# From GitHub (development version):
# install.packages("devtools")
devtools::install_github("yourusername/achieveGap")

Quickstart: Simulated Data

The simplest way to get started is with simulate_gap(), which generates synthetic longitudinal data with a known true gap function.

library(achieveGap)

# Generate data: 400 students in 30 schools, grades K-7
# Gap shape: monotone widening (true gap increases steadily across grades)
sim <- simulate_gap(
  n_students = 400,
  n_schools  = 30,
  gap_shape  = "monotone",
  seed       = 2024
)

# Preview the data
head(sim$data)
#>      student grade school SES_group     score
#> 1          1     0      2         1 0.2109525
#> 401        1     1      2         1 0.7041129
#> 801        1     2      2         1 0.7189772
#> 1201       1     3      2         1 0.6504944
#> 1601       1     4      2         1 0.7958809
#> 2001       1     5      2         1 1.4512534

# The true gap at each grade
sim$true_gap
#>   grade   gap
#> 1     0 0.000
#> 2     1 0.075
#> 3     2 0.160
#> 4     3 0.255
#> 5     4 0.360
#> 6     5 0.475
#> 7     6 0.600
#> 8     7 0.735

The data frame has one row per student-grade observation, with columns student, grade, school, SES_group (0 = high SES, 1 = low SES), and score.

Fitting the Model

The main function is gap_trajectory(). You specify the column names for the outcome, grade, group indicator, school ID, and student ID.

# Formula interface (recommended) — same as lme4/nlme style
fit <- achieve_gap(
  score ~ grade,
  group  = "SES_group",
  random = ~ 1 | school/student,
  data   = sim$data,
  k      = 6,     # spline basis dimension (must be < unique grade values)
  n_sim  = 5000   # posterior draws for simultaneous bands
)

Printing the fitted object gives a concise overview:

print(fit)
#> 
#> Achievement Gap Trajectory Model
#> ---------------------------------------- 
#>   Group variable : SES_group
#>   Group levels   : ref=0 | focal=1
#>   Grade variable : grade
#>   Score variable : score
#>   Observations   : 3200
#>   Students       : 400
#>   Schools        : 30
#>   Conf. level    : 95%
#>   Simult. crit.  : 2.506
#> 
#> Use summary() for estimates and plot() for visualization.

Summarizing Results

summary() displays estimated gap values with standard errors and simultaneous confidence band bounds at equally spaced grade points. Grades marked with * have bands that exclude zero — statistically significant gap with multiplicity control.

summary(fit)
#> 
#> Achievement Gap Trajectory -- Summary
#> ======================================================================== 
#>   Group: SES_group  |  Grade: grade  |  Score: score
#>   N = 400 students in 30 schools (3200 observations)
#> ------------------------------------------------------------------------ 
#>  Grade Gap Est. Std. Err. Sim. Lower Sim. Upper Sig.*
#>   0.00    0.029     0.048     -0.091      0.149      
#>   0.99    0.110     0.040      0.010      0.210     *
#>   1.98    0.197     0.038      0.102      0.292     *
#>   2.97    0.291     0.038      0.196      0.386     *
#>   4.03    0.399     0.038      0.304      0.494     *
#>   5.02    0.508     0.038      0.413      0.603     *
#>   6.01    0.624     0.040      0.524      0.724     *
#>   7.00    0.742     0.048      0.622      0.862     *
#> ------------------------------------------------------------------------ 
#>   Gap range: [0.029, 0.742]
#>   Significant span (simultaneous 95% CB): 0.92 to 7.00
#>   Simultaneous critical value: 2.506
#>   * Sig. indicates simultaneous band excludes zero.

Visualizing the Gap Trajectory

plot() produces a publication-ready figure. By default, both the simultaneous band (light shading) and pointwise band (dark shading) are shown. In simulation settings where the true gap is known, you can overlay it with the true_gap argument.

# Grade labels for x-axis
grade_labs <- c("K", "G1", "G2", "G3", "G4", "G5", "G6", "G7")

# True gap evaluated at the model's grade grid
true_gap_vec <- sim$f1_fun(fit$grade_grid)

plot(
  fit,
  true_gap     = true_gap_vec,
  grade_labels = grade_labs,
  title        = "SES Achievement Gap Trajectory (Simulated Data)"
)

Estimated gap with both confidence bands and true gap overlaid.

To show only the simultaneous band:

plot(fit, band = "simultaneous", grade_labels = grade_labs)

Gap trajectory with simultaneous band only.

Hypothesis Testing

test_gap() runs two types of tests:

Global test: Is the gap smooth significantly different from zero anywhere? (Approximate chi-squared test from mgcv.)
Simultaneous test: Which specific grade intervals have a statistically significant gap, with joint error rate control?

tryCatch(
  test_gap(fit, type = "both"),
  error = function(e) message("test_gap: ", conditionMessage(e))
)
#> 
#> Hypothesis Tests for Gap Trajectory
#> ------------------------------------------ 
#> Global test (approx Wald): ChiSq(6)=289.203, p=1.681e-59  -> REJECT H0
#> Simultaneous band significance: YES
#> Significant intervals (simultaneous band excludes 0):
#>      start end direction
#>  0.9191919   7  positive
#> $type
#> [1] "both"
#> 
#> $alpha
#> [1] 0.05
#> 
#> $global
#> $global$stat
#> [1] 289.2031
#> 
#> $global$df
#> [1] 6
#> 
#> $global$p_value
#> [1] 1.681396e-59
#> 
#> $global$reject
#> [1] TRUE
#> 
#> 
#> $simultaneous
#> $simultaneous$any_significant
#> [1] TRUE
#> 
#> $simultaneous$intervals
#>       start end direction
#> 1 0.9191919   7  positive
#> 
#> 
#> attr(,"class")
#> [1] "achieveGap_test"

Comparing to Separate Splines

fit_separate() provides the comparison approach: two independent spline models, one per group, with the gap computed by post hoc subtraction. As discussed in the paper, this approach underestimates uncertainty.

sep <- fit_separate(
  data    = sim$data,
  score   = "score",
  grade   = "grade",
  group   = "SES_group",
  school  = "school",
  student = "student"
)

# Compare gap estimates side by side
cat("Joint model gap at grade 4: ", round(fit$gap_hat[fit$grade_grid >= 3.9][1], 3), "\n")
#> Joint model gap at grade 4:  0.392
cat("Separate model gap at grade 4:", round(sep$gap_hat[sep$grade_grid >= 3.9][1], 3), "\n")
#> Separate model gap at grade 4: 0.388

# Separate model CIs are narrower (anti-conservative)
cat("\nMean CI width - Joint (pointwise):", round(mean(fit$pw_upper - fit$pw_lower), 3))
#> 
#> Mean CI width - Joint (pointwise): 0.156
cat("\nMean CI width - Separate:         ", round(mean(sep$ci_upper - sep$ci_lower), 3), "\n")
#> 
#> Mean CI width - Separate:          0.205

Non-Monotone Gap Example

The nonmonotone gap shape widens early, plateaus, and narrows slightly — a pattern linear models cannot capture.

sim2 <- simulate_gap(n_students = 400, n_schools = 30,
                     gap_shape = "nonmonotone", seed = 99)

fit2 <- gap_trajectory(
  data    = sim2$data,
  score   = "score",
  grade   = "grade",
  group   = "SES_group",
  school  = "school",
  student = "student",
  n_sim   = 1000,
  verbose = FALSE
)

plot(fit2,
     true_gap     = sim2$f1_fun(fit2$grade_grid),
     grade_labels = grade_labs,
     title        = "Non-Monotone SES Gap: Widens Early, Plateaus, Narrows")

Non-monotone gap trajectory.

Running a Benchmark Simulation

run_simulation() replicates the simulation study from the paper. Use n_reps = 5 for a quick test; the paper used n_reps = 500.

results <- run_simulation(
  n_reps = 5,    # increase to 500 for paper results
  n_sim  = 1000  # increase to 5000 for final run
)

summarize_simulation(results)

Using Your Own Data

To apply gap_trajectory() to real data, prepare a long-format data frame with columns for:

outcome (standardized test score, numeric)
grade/time (numeric, e.g., 0 = kindergarten, 1 = grade 1, …)
group (binary 0/1; 0 = reference group, 1 = focal group)
school ID (factor or integer)
student ID (factor or integer)

# Example with ECLS-K:2011 data (after loading and preparing)
fit_real <- achieve_gap(
  math_score_z ~ grade_numeric,
  group  = "low_ses",
  random = ~ 1 | school_id/child_id,
  data   = eclsk_clean,
  k      = 6,
  n_sim  = 10000
)

summary(fit_real)
plot(fit_real, grade_labels = c("K", "G1", "G2", "G3", "G4", "G5"))
test_gap(fit_real)

Session Info

sessionInfo()
#> R version 4.5.1 (2025-06-13 ucrt)
#> Platform: x86_64-w64-mingw32/x64
#> Running under: Windows 11 x64 (build 26200)
#> 
#> Matrix products: default
#>   LAPACK version 3.12.1
#> 
#> locale:
#> [1] LC_COLLATE=C                          
#> [2] LC_CTYPE=English_United States.utf8   
#> [3] LC_MONETARY=English_United States.utf8
#> [4] LC_NUMERIC=C                          
#> [5] LC_TIME=English_United States.utf8    
#> 
#> time zone: America/Chicago
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] achieveGap_0.1.0
#> 
#> loaded via a namespace (and not attached):
#>  [1] Matrix_1.7-4       gtable_0.3.6       jsonlite_2.0.0     dplyr_1.2.0       
#>  [5] compiler_4.5.1     Rcpp_1.1.0         tidyselect_1.2.1   jquerylib_0.1.4   
#>  [9] splines_4.5.1      scales_1.4.0       boot_1.3-31        yaml_2.3.10       
#> [13] fastmap_1.2.0      lattice_0.22-7     ggplot2_4.0.2      R6_2.6.1          
#> [17] labeling_0.4.3     generics_0.1.4     knitr_1.50         rbibutils_2.3     
#> [21] MASS_7.3-65        tibble_3.3.0       nloptr_2.2.1       minqa_1.2.8       
#> [25] bslib_0.9.0        pillar_1.11.1      RColorBrewer_1.1-3 rlang_1.1.7       
#> [29] cachem_1.1.0       xfun_0.53          sass_0.4.10        S7_0.2.0          
#> [33] cli_3.6.5          withr_3.0.2        mgcv_1.9-4         magrittr_2.0.4    
#> [37] Rdpack_2.6.4       digest_0.6.37      grid_4.5.1         rstudioapi_0.17.1 
#> [41] lme4_2.0-1         lifecycle_1.0.5    nlme_3.1-168       reformulas_0.4.4  
#> [45] vctrs_0.7.1        evaluate_1.0.5     glue_1.8.0         farver_2.1.2      
#> [49] codetools_0.2-20   rmarkdown_2.30     tools_4.5.1        pkgconfig_2.0.3   
#> [53] htmltools_0.5.8.1