---
title: "diagFDR: DIA-NN diagnostics from report.parquet"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{diagFDR: DIA-NN diagnostics from report.parquet}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 7,
  fig.height = 4.5
)
```

This vignette demonstrates how to run **diagFDR** on DIA-NN exports and interpret the
key diagnostics in terms of **scope**, **calibration**, and **stability**.

The typical workflow is:

1. Export DIA-NN results with decoys and a permissive q-value ceiling.
2. Read `report.parquet`.
3. Construct one or more *universes* (global precursor list, run×precursor, etc.).
4. Run diagnostics and inspect tables/plots.
5. (Optional) write tables/plots and a human-readable report to disk.

## Recommended DIA-NN export settings

To enable all diagnostics, export:

- decoys: `--report-decoys`
- a permissive export ceiling: `--qvalue 0.5` (or higher)

The q-value ceiling matters because some diagnostics operate in low-confidence regions
(e.g. equal-chance plausibility checks, or local-window support around cutoffs).

## Runnable toy example (no DIA-NN files required)

We start with a small simulated dataset that exercises the diagFDR functions.
Any workflow producing outputs that can be mapped to the columns
`id`, `is_decoy`, `q`, `pep`, `run`, and `score` can be handled similarly.

```{r toy-data}
library(diagFDR)

set.seed(1)

n <- 3000
toy_global <- data.frame(
  id = paste0("P", seq_len(n)),
  is_decoy = sample(c(FALSE, TRUE), n, replace = TRUE, prob = c(0.97, 0.03)),
  q = pmin(1, runif(n)^3),       # skew toward small q-values
  pep = NA_real_,
  run = NA_character_,
  score = NA_real_
)

x_global <- as_dfdr_tbl(
  toy_global,
  unit = "precursor",
  scope = "global",
  q_source = "toy",
  q_max_export = 0.5
)

diag <- dfdr_run_all(
  xs = list(global = x_global),
  alpha_main = 0.01,
  alphas = c(1e-3, 2e-3, 5e-3, 1e-2, 2e-2, 5e-2, 1e-1, 2e-1),
  low_conf = c(0.2, 0.5)
)
```

### Headline stability at 1%

```{r headline}
diag$tables$headline
```

### Tail support and stability versus threshold

```{r plots-stability}
diag$plots$dalpha
diag$plots$cv
```

### Local boundary support

```{r plot-dwin}
diag$plots$dwin
```

### Threshold elasticity (list sensitivity to changing alpha)

```{r plot-elasticity}
diag$plots$elasticity
```

### Equal-chance plausibility by q-band

```{r equal-chance}
diag$tables$equal_chance_pooled
diag$plots$equal_chance__global
```

## Real DIA-NN parquet workflow

The following code shows how to run the pipeline on a real DIA-NN `report.parquet`.

```{r real-diann, eval=FALSE}
# Requires arrow 
rep <- read_diann_parquet("path/to/report.parquet")

# (A) Global precursor list using Global.Q.Value
# Recommended for experiment-wide (pooled) lists.
x_global_gq <- diann_global_precursor(
  rep,
  q_col = "Global.Q.Value",
  q_max_export = 0.5,
  unit = "precursor",
  scope = "global",
  q_source = "Global.Q.Value"
)

# (B) Run×precursor universe using run-wise Q.Value
# Recommended for per-run decisions / QC.
x_runx <- diann_runxprecursor(
  rep,
  q_col = "Q.Value",
  q_max_export = 0.5,
  id_mode = "runxid",
  unit = "runxprecursor",
  scope = "runwise",
  q_source = "Q.Value"
)

# (C) Scope misuse comparator: min run-wise q over runs per precursor (anti-pattern)
# Useful for demonstrating/diagnosing scope mismatch.
x_minrun <- diann_global_minrunq(
  rep,
  q_col = "Q.Value",
  q_max_export = 0.5,
  unit = "precursor",
  scope = "aggregated",
  q_source = "min_run(Q.Value)"
)

diag <- dfdr_run_all(
  xs = list(global = x_global_gq, runx = x_runx, minrun = x_minrun),
  alpha_main = 0.01,
  compute_pseudo_pvalues = TRUE  # <-- This adds p-value diagnostics
)

# Compare accepted lists across scopes (Jaccard overlap across alpha)
scope_tbl <- dfdr_scope_disagreement(
  x1 = x_global_gq,
  x2 = x_minrun,
  alphas = c(1e-3, 2e-3, 5e-3, 1e-2, 2e-2, 5e-2),
  label1 = "Global.Q.Value",
  label2 = "min_run(Q.Value)"
)

# Write outputs to disk (tables + plots; optionally PPTX)
dfdr_write_report(diag, out_dir = "diagFDR_diann_out", formats = c("csv", "png", "manifest", "readme", "summary"))

# Render a single HTML report (requires rmarkdown in Suggests)
dfdr_render_report(diag, out_dir = "diagFDR_diann_out")
```

## Interpretation notes

- **Scope**: run-wise q-values (`Q.Value`) and global q-values (`Global.Q.Value`)
  do not control the same multiple-testing universe. Constructing experiment-wide lists by
  aggregating run-wise q-values (e.g., taking `min(Q.Value)` across runs) is generally anti-conservative.

- **Stability**: stringent cutoffs can enter a *granular regime* where only a few decoys
  support the boundary. Inspect `D_alpha`, `CV_hat`, and the local boundary support `D_alpha_win`
  before making strong claims at very small alpha.

- **Equal-chance diagnostics** (decoy fractions in low-confidence q-bands) and **PEP reliability**
  are internal consistency checks under target--decoy assumptions; they do not replace external
  validation (e.g. entrapment) when decoy representativeness is uncertain.
