---
title: "Ingesting NASCAR Data"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Ingesting NASCAR Data}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  message = FALSE,
  warning = FALSE
)
```

The **nascaR.data** package hosts its canonical race results datasets publicly on Cloudflare R2. While R users can leverage `load_series()`, non-R users or developers building external pipelines can access these datasets directly in both Parquet and CSV formats.

## Direct Download URLs

All datasets are updated every Monday at 5:00 AM EST during the racing season.

### Parquet Format (Recommended)

Parquet is highly recommended as it preserves exact column data types and has a significantly smaller file size.

* **Cup Series**: `https://nascar.kylegrealis.com/cup_series.parquet`
* **NXS Series**: `https://nascar.kylegrealis.com/nxs_series.parquet`
* **Truck Series**: `https://nascar.kylegrealis.com/truck_series.parquet`

### CSV Format

* **Cup Series**: `https://nascar.kylegrealis.com/cup_series.csv`
* **NXS Series**: `https://nascar.kylegrealis.com/nxs_series.csv`
* **Truck Series**: `https://nascar.kylegrealis.com/truck_series.csv`

---

## Command-Line Download

You can download any of these files directly using `curl` or `wget`:

```bash
# Download Cup Series CSV results
curl -O https://nascar.kylegrealis.com/cup_series.csv

# Download NXS Series Parquet results
curl -O https://nascar.kylegrealis.com/nxs_series.parquet
```

---

## Critical Ingestion Note: NASCAR Car Numbers

When ingesting the CSV files, pay special attention to the `Car` (car number) column. 

In NASCAR, car numbers can contain leading zeros (e.g., `"08"`, `"09"`), which represent entirely different teams and entries from single-digit numbers (e.g., `"8"`, `"9"`).

By default, most CSV parsers (including R's `read.csv()` and Python's `pandas.read_csv()`) guess column types based on initial rows and will parse the `Car` column as an integer. This strips leading zeros, incorrectly converting `"08"` to `8`.

### Correct Ingestion in R

When reading the CSV file in R, explicitly specify that the `Car` column should be parsed as a character string:

```{r, eval = FALSE}
# Base R
cup <- read.csv(
  "cup_series.csv",
  colClasses = c(Car = "character"),
  stringsAsFactors = FALSE
)

# tidyverse / readr
cup <- readr::read_csv(
  "cup_series.csv",
  col_types = readr::cols(Car = readr::col_character())
)
```

### Correct Ingestion in Python (pandas)

When reading the CSV file in Python, use the `dtype` argument to treat the `Car` column as a string:

```python
import pandas as pd

# Load CSV and preserve leading zeros
cup = pd.read_csv("cup_series.csv", dtype={"Car": str})
```

By specifying the type as a string/character, you ensure the car numbers remain accurate for historical roster matching.
