---
title: "Getting Started with hierNest"
author: "Ziren Jiang, Lingfeng Huo, Jue Hou, and Jared D. Huling"
date: "`r Sys.Date()`"
output:
  rmarkdown::html_vignette:
    toc: true
    toc_depth: 3
    number_sections: true
vignette: >
  %\VignetteIndexEntry{Getting Started with hierNest}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse  = TRUE,
  comment   = "#>",
  fig.width = 7,
  fig.height = 5,
  warning   = FALSE,
  message   = FALSE
)
```

# Introduction

The **hierNest** package implements penalized regression with a hierarchical nested parameterization designed for settings in which observations belong to a two-level group hierarchy — for example, Diagnosis Related Groups (DRGs) nested within Major Diagnostic Categories (MDCs) in electronic health record (EHR) data.

## Motivation

In health systems, patient populations are highly heterogeneous. A patient hospitalized for heart failure has very different clinical risk factors from one admitted for a traumatic injury. Fitting a single pooled model ignores this heterogeneity, while fitting fully separate models per DRG is statistically infeasible when subgroup sample sizes are small and the number of predictors is large.

**hierNest** resolves this tension by decomposing each covariate's effect into three interpretable, hierarchically nested components:

$$\beta_d = \mu + \eta^{M(d)} + \delta_d$$

where:

- $\mu$ is the **overall common effect** shared across all subgroups,
- $\eta^{M(d)}$ is the **MDC-specific deviation** for the Major Diagnostic Category $M(d)$ that DRG $d$ belongs to,
- $\delta_d$ is the **DRG-specific deviation** unique to subgroup $d$.

Pairing this reparameterization with structured regularization (lasso or overlapping group lasso) allows the model to borrow strength across related subgroups while remaining flexible enough to capture genuine heterogeneity.

## Two penalization strategies

The package provides two penalization methods:

1. **hierNest-Lasso** — applies a standard lasso penalty to the reparameterized coefficients, with penalty weights adjusted for subgroup sample size. This is implemented via a modified design matrix that can be passed directly to `glmnet`. It is fast and serves as a useful baseline.

2. **hierNest-OGLasso** — applies an overlapping group lasso penalty that additionally enforces *effect hierarchy*: an MDC-specific effect can be non-zero only if the corresponding overall effect is non-zero, and a DRG-specific effect can be non-zero only if the corresponding MDC effect is non-zero. This is the recommended method when the hierarchical structure is plausible.

## Package scope

This vignette covers:

- The structure of the required `hier_info` input.
- Simulating or loading data suitable for `hierNest`.
- Fitting a model with `hierNest()` at a fixed penalty.
- Selecting tuning parameters with `cv.hierNest()`.
- Extracting and interpreting coefficients.
- Generating predictions on new data with `predict_hierNest()`.
- Visualizing results with `plot()`.

---

# Installation

Install the package from CRAN (when available) or from source:

```{r install, eval = FALSE}
# From CRAN
install.packages("hierNest")

# From source (development version)
# install.packages("devtools")
devtools::install_github("ZirenJiang/hierNest")
```

Load the package:

```{r load}
library(hierNest)
```

---

# The `hier_info` matrix

All `hierNest` functions require a `hier_info` argument that encodes the two-level hierarchy. This is an $n \times 2$ integer matrix where:

- Column 1 gives the **group index** (e.g., MDC) for each observation, taking values in $\{1, \ldots, n_{\text{MDC}}\}$.
- Column 2 gives the **subgroup index** (e.g., DRG) for each observation, taking values in $\{1, \ldots, n_{\text{DRG}}\}$.

Subgroup indices must be *globally unique* across groups — two DRGs in different MDCs must have different integer codes.

```{r hier_info_example, eval = FALSE}
# Illustration: 3 MDCs with 2 DRGs each (6 DRGs total)
#
# MDC 1 -> DRG 1, DRG 2
# MDC 2 -> DRG 3, DRG 4
# MDC 3 -> DRG 5, DRG 6
#
# For an observation in MDC 2 / DRG 4:
#   hier_info[i, ] = c(2, 4)
```

---

# Example data

The package ships with a built-in example dataset that illustrates the required data structure.

```{r load_data}
data("example_data")

# Dimensions
cat("X dimensions:", dim(example_data$X), "\n")
cat("Y length:     ", length(example_data$Y), "\n")
cat("hier_info dimensions:", dim(example_data$hier_info), "\n")

# Peek at hier_info
head(example_data$hier_info)

# Group sizes
table_mdc <- table(example_data$hier_info[, 1])
cat("\nObservations per MDC group:\n")
print(table_mdc)

table_drg <- table(example_data$hier_info[, 2])
cat("\nObservations per DRG subgroup (first 10):\n")
print(head(table_drg, 10))

# Outcome distribution (binary)
cat("\nOutcome distribution:\n")
print(table(example_data$Y))
```

---

# Fitting the model

## `hierNest()` — fit at a single penalty level

`hierNest()` fits the full regularization path for a single pair of hyperparameters $(\alpha_1, \alpha_2)$. It is useful for exploration or when tuning parameters have already been selected.

```{r hierNest_fit, eval = FALSE}
library(hierNest)
fit <- hierNest(
  x         = example_data$X,
  y         = example_data$Y,
  hier_info = example_data$hier_info,
  family    = "binomial",
  asparse1  = 1,    # weight for MDC-level group penalty
  asparse2  = 1     # weight for DRG-level subgroup penalty
)

# The fit contains a lambda sequence and corresponding coefficient paths
length(fit$lambda)
dim(fit$beta)   # rows = reparameterized coefficients, cols = lambda values
```

Key arguments:

| Argument | Description |
|---|---|
| `x` | $n \times p$ predictor matrix |
| `y` | Response (numeric for Gaussian, 0/1 or factor for binomial) |
| `hier_info` | $n \times 2$ group/subgroup index matrix |
| `family` | `"gaussian"` or `"binomial"` |
| `asparse1` | $\alpha_1$: relative weight for the MDC-level overlapping group penalty |
| `asparse2` | $\alpha_2$: relative weight for the DRG-level overlapping group penalty |
| `nlambda` | Length of the $\lambda$ sequence (default 100) |
| `standardize` | Whether to standardize predictors before fitting (default `TRUE`) |
| `method` | For now, only `"overlapping"` (hierNest-OGLasso, default) |

---

## `cv.hierNest()` — cross-validated tuning parameter selection

In practice, the penalty hyperparameters $\lambda$, $\alpha_1$, and $\alpha_2$ need to be chosen by cross-validation. `cv.hierNest()` wraps the fitting procedure with cross-validation and returns the optimal parameter combination.

### Cross-validation methods

The `cvmethod` argument controls how $\alpha_1$ and $\alpha_2$ are selected:

- `"general"` — uses a single pair of $\alpha_1$ and $\alpha_2$ (supplied as scalars) and runs standard cross-validation over the $\lambda$ sequence only.
- `"grid_search"` — searches a grid of $\alpha_1 \times \alpha_2$ pairs. The ranges are specified via `asparse1` and `asparse2` (each a length-2 vector giving the lower and upper bound), with `asparse1_num` and `asparse2_num` controlling the grid resolution.
- `"user_supply"` — the user provides explicit paired vectors of $(\alpha_1, \alpha_2)$ values to evaluate.

### Recommended: grid search

```{r cv_fit, eval = FALSE}
cv.fit <- cv.hierNest(
  x           = example_data$X,
  y           = example_data$Y,
  method      = "overlapping",
  hier_info   = example_data$hier_info,
  family      = "binomial",
  partition   = "subgroup",    # stratify CV folds within subgroups
  cvmethod    = "grid_search",
  asparse1    = c(0.5, 20),    # search alpha_1 over [0.5, 20]
  asparse2    = c(0.05, 0.20), # search alpha_2 over [0.05, 0.20]
  asparse1_num = 3,            # 3 x 3 = 9 grid points
  asparse2_num = 3,
  nlambda     = 50,            # lambda values per grid point
  pred.loss   = "ROC"          # maximize AUROC
)
```

> **Note on `partition = "subgroup"`:** This stratifies the cross-validation folds so that each fold contains observations from all DRG subgroups (to the extent possible). This avoids degenerate folds where a small subgroup is entirely absent from the training data.

### Single hyperparameter pair (fast)

```{r cv_fit_general, eval = FALSE}
cv.fit.simple <- cv.hierNest(
  x         = example_data$X,
  y         = example_data$Y,
  method    = "overlapping",
  hier_info = example_data$hier_info,
  family    = "binomial",
  partition = "subgroup",
  cvmethod  = "general",
  asparse1  = 1,
  asparse2  = 0.1,
  nlambda   = 100,
  pred.loss = "ROC"
)
```

### User-supplied grid

```{r cv_fit_user, eval = FALSE}
cv.fit.user <- cv.hierNest(
  x         = example_data$X,
  y         = example_data$Y,
  method    = "overlapping",
  hier_info = example_data$hier_info,
  family    = "binomial",
  partition = "subgroup",
  cvmethod  = "user_supply",
  asparse1  = c(0.5, 1, 5, 10),   # explicit (alpha1, alpha2) pairs
  asparse2  = c(0.05, 0.1, 0.1, 0.2),
  nlambda   = 50
)
```

---

# Inspecting the cross-validated fit

After running `cv.hierNest()`, several components of the returned object are of interest.

```{r inspect_cv, eval = FALSE}
# Optimal tuning parameters selected by cross-validation
cv.fit$lambda.min   # lambda minimising CV loss
cv.fit$min_alpha1   # alpha_1 selected
cv.fit$min_alpha2   # alpha_2 selected


# Number of non-zero coefficients in the reparameterized model
# at the optimal lambda
nnz <- sum(cv.fit$hierNest.fit$beta[,
           which(cv.fit$hierNest.fit$lambda == cv.fit$lambda.min)] != 0)
cat("Non-zero reparameterized coefficients:", nnz, "\n")
```

---

# Visualizing the estimated effects

The `plot()` method for `cv.hierNest` objects provides two heatmap visualizations.

## Coefficient heatmap

Plotting with `type = "coefficients"` shows the estimated overall, MDC-specific, and DRG-specific components for each selected predictor, at the cross-validation optimal $\lambda$.

```{r plot_coef, eval = FALSE}
# Returns a plotly interactive heatmap
p_coef <- plot(cv.fit, type = "coefficients")
p_coef
```

Each row of the heatmap corresponds to one level in the hierarchy (overall mean, then each MDC, then each DRG within that MDC). Each column is a selected predictor. Blue cells indicate a positive contribution; red cells indicate negative. Rows with all zeros are suppressed.

## Subgroup effect heatmap

Plotting with `type = "Subgroup effects"` reconstructs the *total* effect for each DRG subgroup: $\hat{\beta}_d = \hat{\mu} + \hat{\eta}^{M(d)} + \hat{\delta}_d$.

```{r plot_subgroup, eval = FALSE}
p_sub <- plot(cv.fit, type = "Subgroup effects")
p_sub
```

This visualization is useful for comparing risk profiles across subgroups — subgroups with similar colors for a given predictor share similar risk processes for that covariate.

---

# Methodological background

The statistical framework underlying **hierNest** is described in detail in:

> Jiang, Z., Huo, L., Hou, J., Vaughan-Sarrazin, M., Smith, M. A., & Huling, J. D. (2026+). *Heterogeneous readmission prediction with hierarchical effect decomposition and regularization*. 

## Hierarchical reparameterization

For participant $i$ with DRG $D_i = d$, the logistic regression model is:

$$\text{logit}(\Pr(Y_i = 1 \mid X_i, D_i = d)) = X_i^\top \beta_d$$

The hierNest reparameterization decomposes $\beta_d$ as:

$$\beta_d = \mu + \eta^{M(d)} + \delta_d$$

This is encoded via a modified design matrix $X_H$ constructed as the Khatri–Rao (column-wise Kronecker) product of $X$ and the hierarchy indicator matrix $H = [\mathbf{1}_n \;;\; H_M \;;\; H_D]$.

## Overlapping group lasso penalty

The overlapping group lasso penalty on the grouped coefficient vector $\theta_j = \{\mu_j, \{\eta_{Mj}\}_{M}, \{\delta_{dj}\}_d\}$ for the $j$-th predictor is:

$$P_{\text{og}}(\theta) = \lambda \sum_{j=1}^{p} \left[ |\theta_j|_2 + \sum_{M \in \mathcal{M}} \left(\alpha_1 |\theta_{Mj}|_2 + \alpha_2 |\eta_{Mj}| \right) \right]$$

This penalty enforces a hierarchical zero pattern: $\mu_j = 0$ implies all $\eta_{Mj} = 0$ and $\delta_{dj} = 0$; $\eta_{Mj} = 0$ implies all $\delta_{dj} = 0$ for $d \in M$.

## Computation

For `method = "overlapping"`, the majorization-minimization (MM) algorithm described in Algorithm 1 of the paper is used, with sequential strong rules to skip inactive predictor groups and accelerate computation. The design matrix is stored as a sparse matrix to minimise memory requirements.

---

# Session information

```{r session_info}
sessionInfo()
```