---
title: "Visualization with ggplot2"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Visualization with ggplot2}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 7,
  fig.height = 5,
  fig.alt = "Example ggplot2 visualization of Bayesian surprise values."
)
```

```{r setup}
library(bayesiansurpriser)
library(sf)
library(ggplot2)
```

## Overview

The `bayesiansurpriser` package provides seamless ggplot2 integration through custom scales and computed surprise values that can be mapped to aesthetics.

## Loading Example Data

```{r data}
nc <- st_read(system.file("shape/nc.shp", package = "sf"), quiet = TRUE)
```

## Basic Workflow: Compute then Plot

The recommended workflow is to compute surprise first, then use ggplot2:

```{r basic}
# Compute surprise
result <- surprise(nc, observed = SID74, expected = BIR74)

# Plot with ggplot2 using geom_sf
ggplot(result) +
  geom_sf(aes(fill = surprise)) +
  scale_fill_surprise() +
  labs(title = "Bayesian Surprise Map")
```

## Color Scales

### Sequential Scale: scale_fill_surprise()

For absolute surprise values (always positive):

```{r scale-sequential}
ggplot(result) +
  geom_sf(aes(fill = surprise)) +
  scale_fill_surprise(option = "inferno") +
  labs(title = "Inferno Palette")
```

Available viridis options: "viridis", "magma", "plasma", "inferno", "cividis", "rocket", "mako", "turbo"

```{r scale-options, fig.show='hold', out.width='50%'}
p1 <- ggplot(result) +
  geom_sf(aes(fill = surprise)) +
  scale_fill_surprise(option = "viridis") +
  labs(title = "Viridis")

p2 <- ggplot(result) +
  geom_sf(aes(fill = surprise)) +
  scale_fill_surprise(option = "plasma") +
  labs(title = "Plasma")

p1
p2
```

### Diverging Scale: scale_fill_surprise_diverging()

For signed surprise (positive = over-representation, negative = under-representation):

```{r scale-diverging}
ggplot(result) +
  geom_sf(aes(fill = signed_surprise)) +
  scale_fill_surprise_diverging() +
  labs(title = "Diverging Scale for Signed Surprise")
```

Custom colors:

```{r scale-diverging-custom}
ggplot(result) +
  geom_sf(aes(fill = signed_surprise)) +
  scale_fill_surprise_diverging(
    low = "#2166AC",   # Blue
    mid = "#F7F7F7",   # Light gray
    high = "#B2182B"   # Red
  ) +
  labs(title = "Custom Diverging Colors")
```

### Binned Scale: scale_fill_surprise_binned()

For discrete categories:

```{r scale-binned}
ggplot(result) +
  geom_sf(aes(fill = surprise)) +
  scale_fill_surprise_binned(n.breaks = 5) +
  labs(title = "Binned Surprise Scale")
```

## Combining with Other ggplot2 Elements

### Adding Labels

```{r labels}
# Top 5 most surprising counties
top5 <- result[order(-result$surprise), ][1:5, ]

ggplot(result) +
  geom_sf(aes(fill = surprise)) +
  geom_sf_text(data = top5, aes(label = NAME), size = 3) +
  scale_fill_surprise() +
  labs(title = "Top 5 Most Surprising Counties Labeled")
```

### Faceting

```{r facet}
# Compare two time periods
result74 <- surprise(nc, observed = SID74, expected = BIR74)
result79 <- surprise(nc, observed = SID79, expected = BIR79)

result74$period <- "1974-78"
result79$period <- "1979-84"

combined <- rbind(result74, result79)

ggplot(combined) +
  geom_sf(aes(fill = surprise)) +
  scale_fill_surprise() +
  facet_wrap(~period) +
  labs(title = "Surprise by Time Period")
```

### Theme Customization

```{r theme}
ggplot(result) +
  geom_sf(aes(fill = surprise)) +
  scale_fill_surprise(name = "Surprise\n(bits)") +
  labs(
    title = "Bayesian Surprise: NC SIDS Data",
    subtitle = "Identifying unexpectedly high/low SIDS rates",
    caption = "Data: NC SIDS 1974-78"
  ) +
  theme_minimal() +
  theme(
    legend.position = "bottom",
    legend.key.width = unit(2, "cm")
  )
```

## Non-Spatial Data

For non-spatial data, use standard ggplot2 geoms after computing surprise:

```{r non-spatial}
# Create example data
df <- data.frame(
  region = LETTERS[1:10],
  observed = c(50, 120, 80, 200, 45, 150, 90, 180, 60, 110),
  expected = c(100, 100, 100, 100, 100, 100, 100, 100, 100, 100) * 10
)

result_df <- surprise(df, observed = observed, expected = expected)

ggplot(result_df, aes(x = reorder(region, -surprise), y = surprise)) +
  geom_col(aes(fill = surprise)) +
  scale_fill_surprise() +
  labs(x = "Region", y = "Surprise (bits)",
       title = "Surprise by Region") +
  theme_minimal()
```

## Best Practices

1. **Use diverging scales for signed surprise**: Makes interpretation intuitive
2. **Consider binned scales for communication**: Discrete categories are easier to read
3. **Label notable regions**: Help viewers identify specific areas
4. **Include a legend title with units**: "Surprise (bits)" clarifies the measure
5. **Use minimal themes for maps**: Reduce visual clutter