---
title: "Getting Started with rurality"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Getting Started with rurality}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

## Overview

The `rurality` package provides rurality classification data for all U.S.
counties and ZIP codes. It bundles USDA Rural-Urban Continuum Codes (RUCC 2023),
Rural-Urban Commuting Area codes (RUCA 2020), and a composite rurality score
that combines multiple data sources into a single 0--100 measure.

The package is designed for researchers who need to classify locations by
rurality without manually downloading and reshaping USDA spreadsheets.

```{r setup}
library(rurality)
library(dplyr)
```

## Looking up a county

The simplest use case is looking up rurality data for a county by its 5-digit
FIPS code:

```{r}
get_rurality("05031")
```

If you just need the score or the RUCC code:

```{r}
rurality_score("05031")
get_rucc("05031")
```

Multiple FIPS codes work too:

```{r}
rurality_score(c("05031", "06037", "48453"))
```

## Looking up a ZIP code

RUCA codes are available at the ZIP/ZCTA level:

```{r}
get_ruca("72401")
get_ruca(c("72401", "90210", "59801"))
```

## Merging onto your data

The most common research workflow is merging rurality data onto an existing
dataset. The `add_rurality()` function handles this:

```{r}
my_data <- data.frame(
  fips = c("05031", "06037", "48453", "30063"),
  outcome = c(0.72, 0.41, 0.58, 0.89)
)

my_data |> add_rurality()
```

By default, three columns are added: `rurality_score`, `rurality_classification`,
and `rucc_2023`. Use `vars = "all"` for the full set:

```{r}
my_data |> add_rurality(vars = "all") |> glimpse()
```

If your FIPS column has a different name, specify it:

```{r}
other_data <- data.frame(county_fips = c("05031", "06037"), y = 1:2)
other_data |> add_rurality(fips_col = "county_fips")
```

## Classifying scores

The `classify_rurality()` function converts numeric scores to labels:

```{r}
classify_rurality(c(10, 30, 50, 70, 90))
```

The thresholds are:

| Score | Classification |
|-------|---------------|
| 80--100 | Very Rural |
| 60--79  | Rural |
| 40--59  | Mixed |
| 20--39  | Suburban |
| 0--19   | Urban |

## Browsing the full dataset

The `county_rurality` dataset contains all 3,235 U.S. counties:

```{r}
county_rurality
```

Filter to a state:

```{r}
county_rurality |>
  filter(state_abbr == "AR") |>
  select(county_name, rurality_score, rurality_classification, rucc_2023) |>
  arrange(desc(rurality_score)) |>
  head(10)
```

## Score distribution

```{r, fig.width=6, fig.height=4}
if (requireNamespace("ggplot2", quietly = TRUE)) {
  ggplot2::ggplot(county_rurality, ggplot2::aes(x = rurality_score)) +
    ggplot2::geom_histogram(binwidth = 5, fill = "#15803d", color = "white") +
    ggplot2::labs(
      title = "Distribution of Rurality Scores Across U.S. Counties",
      x = "Rurality Score (0-100)",
      y = "Number of Counties"
    ) +
    ggplot2::theme_minimal()
}
```

## Methodology

The composite rurality score is a weighted average of three components:

| Component | Weight | Source |
|-----------|--------|--------|
| RUCC score | 55% | USDA Economic Research Service, 2023 |
| Population density | 28% | Census ACS 2022 5-year estimates |
| Distance to metro | 17% | Haversine distance to nearest metro area |

For full details, see [rurality.app](https://rurality.app).

## Citation

```{r}
citation("rurality")
```