---
title: "Panels, harmonisation, reconciliation, real terms, per-capita"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Panels, harmonisation, reconciliation, real terms, per-capita}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(collapse = TRUE, comment = "#>", eval = FALSE)
```

This vignette walks through the four panel-ready transformations
that take raw ATO fetches to a defensible longitudinal analysis:

1. Stack multiple years with `year =` vector input.
2. Harmonise column names across releases with `ato_harmonise()`.
3. Reconcile totals against Final Budget Outcome with
   `ato_reconcile()`.
4. Express in real terms and per capita with `ato_deflate()` and
   `ato_per_capita()`.

## Build a multi-year panel

```{r}
library(ato)

pc <- ato_individuals_postcode(
  year = c("2018-19", "2019-20", "2020-21",
           "2021-22", "2022-23"),
  state = "NSW"
)

nrow(pc)
unique(pc$year)
```

## Harmonise column names

Column names drift: `total_income` in some years, `total_income_or_loss`
in others; `state` vs `state_territory`. `ato_harmonise()` renames
columns to canonical names from `ATO_COL_VARIANTS`.

```{r}
pc <- ato_harmonise(pc)
names(pc)
```

## Reconcile against Commonwealth totals

Before reporting a panel sum in a paper, check it against the Final
Budget Outcome. A 1-3 per cent accrual-vs-cash gap is expected;
larger gaps warrant investigation.

```{r}
ind_2223 <- ato_individuals(year = "2022-23")
total_tax <- sum(ind_2223$tax_payable, na.rm = TRUE)

ato_reconcile(
  value   = total_tax,
  year    = "2022-23",
  measure = "individuals_income_tax_net"
)
```

## Real-terms comparison

ATO values are nominal AUD of the reporting year. For time-series
comparison, deflate to a common base year using the bundled ABS
CPI series.

```{r}
panel_annual <- aggregate(taxable_income ~ year, data = pc, FUN = sum,
                          na.rm = TRUE)
panel_annual$real_2022_23 <- ato_deflate(
  panel_annual$taxable_income,
  year = panel_annual$year,
  base = "2022-23"
)
panel_annual
```

## Per-capita normalisation

```{r}
panel_annual$per_capita <- ato_per_capita(
  panel_annual$real_2022_23,
  year = panel_annual$year
)
panel_annual
```

The resulting four-column data frame (year, nominal, real, per
capita) is the canonical shape for distributional and time-series
tax papers.
