---
title: "Judge-fixed-effects designs"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Judge-fixed-effects designs}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "figures/judge-designs-"
)
set.seed(1)
```

## The judge-IV design

A widely-used research design in law, labour, and public economics instruments a binary treatment with the randomly-assigned identity of a judge, caseworker, examiner, or police officer. Examples include the effect of pretrial detention on defendant outcomes (Dobbie-Goldin-Yang 2018, Leslie-Pope 2017), the effect of foster care on adult outcomes (Doyle 2007), and the effect of disability insurance receipt on labour supply (Maestas-Mullen-Strand 2013).

The identifying assumption is that judges differ in their leniency but the assignment is as-good-as-random, so `Pr(D = 1 | judge)` varies across judges while `E[Y(0), Y(1), D(0), D(1) | judge]` does not. Under monotonicity, each judge's leniency identifies a LATE for the compliers at that judge.

Frandsen, Lefgren, and Leslie (2023, *American Economic Review*) derive a testable implication of this joint null: `E[Y | judge]` must be a linear function of `E[D | judge]`. `iv_testjfe()` runs this test.

## Setup

```{r, message = FALSE}
library(ivcheck)
```

## A valid judge-IV design

Simulate 20 judges with heterogeneous leniency. Under the valid-IV null, `Y` depends on `D` only; judge identity affects `Y` only through its effect on `D`.

```{r}
set.seed(1)
K <- 20
n <- 3000
judge <- sample.int(K, n, replace = TRUE)
p_by_j <- seq(0.2, 0.7, length.out = K)
d <- rbinom(n, 1, p_by_j[judge])
y <- rnorm(n, mean = d)

r_valid <- iv_testjfe(y, d, judge, n_boot = 100, parallel = FALSE)
print(r_valid)
```

The null is not rejected. The fitted slope and intercept of `mu_j` on `p_j` are close to their population values (1 and 0 under this DGP).

## A clear exclusion violation

Now suppose judge identity affects `Y` directly, beyond its effect on `D`. This is an exclusion violation.

```{r}
set.seed(2)
# Same first stage, but Y now depends on judge identity non-linearly
y_viol <- rnorm(n, mean = d + 1.5 * sin(judge * 0.5))

r_viol <- iv_testjfe(y_viol, d, judge, n_boot = 100, parallel = FALSE)
print(r_viol)
```

The test rejects emphatically. The direct judge effect on `Y` breaks the linearity implication and the chi-squared-with-`K-2`-df asymptotic distribution says the observed statistic is extremely unlikely under the null.

## What exactly is being tested

Under the FLL null, the structural model is `Y = alpha + beta * D + epsilon` with `E[epsilon | judge] = 0`. Averaging both sides by judge:

```
E[Y | J = j] = alpha + beta * E[D | J = j]
```

So `mu_j = alpha + beta * p_j` must hold for every judge `j`. `iv_testjfe()`:

1. Computes the per-judge means `mu_j` and propensities `p_j`.
2. Fits a weighted-LS regression of `mu_j` on `p_j` with weights equal to each judge's sample size.
3. Estimates the pooled within-judge variance of the structural residuals `u_i = y_i - alpha_hat - beta_hat * d_i`.
4. Computes `T_n = sum(n_j * residual_j^2) / sigma^2_hat` and compares to `chi^2_{K - 2}`.

The Monte Carlo simulations in our test suite confirm that the empirical null distribution matches `chi^2_{K - 2}` closely (mean, variance, and 95th percentile all within Monte Carlo error).

## Inspecting the per-judge residuals

The `binding` slot identifies the judge with the largest absolute residual, which is often a useful lead for investigating where the violation originates:

```{r}
r_viol$binding
```

For a fuller picture, plot `mu_j` against `p_j` with the fitted line:

```{r, fig.width = 6, fig.height = 4}
judges <- sort(unique(judge))
n_j <- sapply(judges, function(j) sum(judge == j))
p_j <- sapply(judges, function(j) mean(d[judge == j]))
mu_j_viol <- sapply(judges, function(j) mean(y_viol[judge == j]))
mu_j_valid <- sapply(judges, function(j) mean(y[judge == j]))

oldpar <- par(no.readonly = TRUE)
par(mfrow = c(1, 2))
plot(p_j, mu_j_valid, pch = 19, cex = sqrt(n_j) / 5,
     main = "Valid design", xlab = "p_j", ylab = "mu_j")
abline(lm(mu_j_valid ~ p_j, weights = n_j), col = "red")

plot(p_j, mu_j_viol, pch = 19, cex = sqrt(n_j) / 5,
     main = "Exclusion violation", xlab = "p_j", ylab = "mu_j")
abline(lm(mu_j_viol ~ p_j, weights = n_j), col = "red")
par(oldpar)
```

Under a valid design, the points lie on the line. Under a violation, they wander off it in a pattern related to the direct judge effect.

## With covariates

If the design is only plausible conditional on some covariates (e.g. case characteristics that correlate with judge assignment by chance), pass them via `x`. The function residualises `y` and `d` on `x` before computing the per-judge means:

```{r}
set.seed(1)
K <- 15
n <- 2000
judge <- sample.int(K, n, replace = TRUE)
x_case <- rnorm(n)
p_by_j <- seq(0.2, 0.7, length.out = K)
d <- rbinom(n, 1, p_by_j[judge])
y <- rnorm(n, mean = d + 0.5 * x_case)

r_x <- iv_testjfe(y, d, judge, x = x_case,
                  n_boot = 100, parallel = FALSE)
print(r_x)
```

## Caveats

This is the simplified v0.1.0 implementation: it tests the linearity *implication* of the FLL null but not the full restricted-LS formulation of Frandsen, Lefgren, and Leslie (2023). The Stata `testjfe` module (Frandsen, BYU, 2020) implements the full published test. `ivcheck` v0.2.0 will port it; until then, treat `iv_testjfe()` as a fast necessary-condition check and run the Stata module for the final paper if you need the full test.

## References

Frandsen, B. R., Lefgren, L. J., and Leslie, E. C. (2023). Judging Judge Fixed Effects. *American Economic Review* 113(1): 253-277.