---
title: "OnlineSurr: Fitting marginal/conditional models and computing LPTE/CPTE"
output:
  rmarkdown::html_vignette:
    toc: true
    toc_depth: 3
bibliography: '`r system.file("REFERENCES.bib", package="OnlineSurr")`'
csl: '`r system.file("apalike.csl", package="OnlineSurr")`'
vignette: >
  %\VignetteIndexEntry{OnlineSurr: Fitting marginal/conditional models and computing LPTE/CPTE}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  message = FALSE,
  warning = FALSE
)
```

<style>
body {
text-align: justify}
</style>

<script type="text/x-mathjax-config">
MathJax.Hub.Config({
  TeX: { equationNumbers: { autoNumber: "AMS" } }
});
</script>

This vignette demonstrates the main workflow of the `OnlineSurr` package:

1. Prepare a longitudinal dataset with equally-spaced measurement times.
2. Fit the marginal and conditional models with `fit.surr()`.
3. Summarize results with `summary()`, visualize with `plot()`.
4. Test time-homogeneity with `time_homo_test()`.

The package returns a fitted object of class `fitted_onlinesurr` that stores point estimates and bootstrap draws for treatment-effect trajectories and PTE-based summaries.

# Data requirements and conventions

`fit.surr()` expects data in **long format** with one row per subject-time measurement. Key requirements enforced by the code:

- `id` identifies subjects; there must be **at most one observation per subject-time** combination.
- `treat` indicates treatment assignment; it is coerced to a factor and is intended to represent **two treatment levels**.
- `time` must be **numeric and equally spaced** across observed time points. If `time` is omitted, the function creates a within-subject index `Time` assuming the data are already ordered and equally spaced.
- The surrogate design must not make treatment a linear combination of surrogate terms; otherwise the conditional model is not identifiable.

# Package functions used in this vignette

- `fit.surr()` fits:
  - a *marginal* model producing total treatment effects $\Delta(t)$
  - a *conditional* model (given surrogate) producing residual treatment effects $\Delta_R(t)$
  - stores bootstrap draws for the corresponding fixed-effect parameters.
- `plot.fitted_onlinesurr()` plots:
  - Local PTE: $\text{LPTE}(t) = 1 - \Delta_R(t)/\Delta(t)$
  - Cumulative PTE: $\text{CPTE}(t) = 1 - \sum_{h\le t} \Delta_R(h) / \sum_{h\le t} \Delta(h)$
  - Treatment effects $\Delta(t)$ and $\Delta_R(t)$
- `time_homo_test()` tests the hypothesis that the PTE is constant over time (implemented via a max-type statistic and Monte Carlo approximation of the null).

## Fitting the models with `fit.surr()`

`fit.surr()` requires:

- `formula`: outcome mean model. The function will internally add treatment-by-time fixed effects.
- `id`: subject identifier (unquoted).
- `treat`: treatment variable (unquoted).
- `surrogate`: surrogate structure (as a formula or a string).
- `time`: numeric time variable (unquoted).

```{r eval=TRUE}
library(OnlineSurr)
head(sim_onlinesurr)

fit <- fit.surr(
  formula   = y ~ 1, # baseline fixed effects; trt*time terms added internally
  id        = id,
  surrogate = ~s, # surrogate structure
  treat     = trt,
  data      = sim_onlinesurr,
  time      = time,
  N.boots   = 2000, # bootstrap draws stored in the fitted object
  verbose   = 0 # hide progress
)
```

The formulas for the fixed effects and the surrogate structures accept any temporal structure available in the `kDGLM` package (see its vignette for details). Functions that transform the data are also supported.

In particular, we provide the `lagged` function, which computes lagged values of its arguments and can be included in a model formula to account for delayed or lingering effects of a predictor over time. We also provide the `s` function, which generates a spline basis for a numeric variable and can be used to model smooth, potentially non-linear effects without having to specify the basis expansion manually.

```{r eval=FALSE}
library(OnlineSurr)

fit <- fit.surr(
  formula   = y ~ 1, # baseline fixed effects; trt*time terms added internally
  id        = id,
  surrogate = ~ s(s) + s(lagged(s, 1)) + s(lagged(s, 2)), # surrogate structure
  treat     = trt,
  data      = sim_onlinesurr,
  time      = time,
  verbose   = 0 # hide progress
)
```

### What `fit.surr()` stores

The returned object has class `fitted_onlinesurr` and is a list with (at least):

- `fit$T`: number of time points
- `fit$N`: number of subjects
- `fit$n.fixed`: number of fixed-effect coefficients per subject design (reference size)
- `fit$Marginal$point`: point estimates (vector) from the marginal model
- `fit$Marginal$smp`: bootstrap draws (matrix) from the marginal model
- `fit$Conditional$point`: point estimates (vector) from the conditional model
- `fit$Conditional$smp`: bootstrap draws (matrix) from the conditional model

The first `T` (in practice, the first `n.fixed`) elements used by plotting/testing methods correspond to the time-indexed treatment-effect parameters.

# Summaries and inference

## Printing a summary

The package provides an S3 summary method `summary.fitted_onlinesurr()`.

- `t` selects the time index.
- `cumulative=TRUE` reports cumulative effects up to time `t` (when implemented by the method).
- `cumulative=FALSE` reports time-specific quantities at time `t` only.

```{r eval=TRUE}
summary(fit, t = 6, cumulative = TRUE)
```

## Plotting LPTE, CPTE, and treatment effects

`plot()` dispatches to `plot.fitted_onlinesurr()`.

```{r eval=TRUE}
plot(fit, type = "LPTE") # Local PTE over time
plot(fit, type = "CPTE") # Cumulative PTE over time
plot(fit, type = "Delta") # Delta and Delta_R over time
```

Interpretation notes:

- LPTE measures, at each time, the proportion of the total treatment effect explained by the surrogate, using the ratio $1 - \Delta_R(t)/\Delta(t)$.
- CPTE aggregates effects up to time $t$, using cumulative sums.

## Testing time-homogeneity

`time_homo_test()` provides a max-type test, using a Monte Carlo approximation of the null distribution.

```{r eval=TRUE}
test <- time_homo_test(fit, signif.level = 0.05, N.boots = 50000)
test
```

Returned components:

- `T`: observed test statistic
- `T.crit`: critical value at the requested significance level
- `p.value`: Monte Carlo p-value

# Practical tips and common pitfalls

1. **Time index must be numeric and equally spaced.**  
   If you have missing measurements, include the missing time points with `NA` outcomes rather than dropping those times, so spacing remains consistent.

2. **One row per subject-time.**  
   If you have duplicates, aggregate first (e.g., average within a time window) or decide which measurement to keep.

3. **Bootstrap size tradeoff.**  
   `fit.surr(N.boots=...)` controls stored bootstrap draws used for confidence intervals associated with the treatment effect, LPTE and CPTE; `time_homo_test(N.boots=...)` controls Monte Carlo draws for the null distribution of the time homogeneity test.
   
See @santos2026causalframeworkevaluatingjointly for details about the theoretical aspects of the package.

# Session info

```{r eval=TRUE}
sessionInfo()
```
