cr_outline – Costa Rica full outline (continental
landmass + Isla del Coco and all minor oceanic islands), derived from
GADM 4.1.h3sdm_filter_outliers() removes environmental
outliers from presence records prior to model training using Mahalanobis
distance (D2) in environmental space. Only presences
(presence == "1") are evaluated; pseudo-absences are always
retained unchanged. The outlier threshold is derived from the
chi-squared distribution (qchisq(threshold, df = k),
default threshold = 0.975). Returns a list with the cleaned
PA dataset, a data frame of removed records with their D2 values, the
count of removed records, and the threshold value used. Complements
h3sdm_aoa(): while the AOA evaluates prediction reliability
after training, this function improves input data quality before
training.
h3sdm_pres() assigns species occurrence records to
H3 hexagons and returns only hexagons with at least one presence record.
This is the first step of a two-stage workflow where pseudo-absences are
generated after environmental variables have been extracted.
h3sdm_pa() has been redesigned to generate
pseudo-absences stratified in environmental space using k-means
clustering. Pseudo-absences now cover the full range of environmental
conditions available in the AOI, reducing environmental bias introduced
by spatially clustered occurrence records. The function now receives
presence hexagons from h3sdm_pres() and the full hexagonal
grid with extracted variables from
h3sdm_predictors().
cr_outline_c dataset regenerated from GADM 4.1 with
a fully reproducible script in data-raw/cr_outline.R.
Source attribution updated to GADM 4.1. Geometry is now consistent with
the cr_outline_c dataset in paisaje.
h3sdm_pa() and h3sdm_pa_from_records()
now accept a buffer_k argument (default 1).
Hexagons within buffer_k H3 rings of any presence hexagon
are excluded from the pseudo-absence candidate pool, preventing
pseudo-absences from being placed in areas likely occupied but not yet
recorded. Set to 0 to disable.
h3sdm_pa_from_records() now accepts an optional
predictors_sf argument. When provided, pseudo-absences are
selected by stratified sampling in environmental space using k-means
clustering. If NULL (default), the previous random
geographic sampling behaviour is preserved.
h3sdm_aoa() now extracts predictor variable names
from the model recipe instead of the model formula, fixing an error with
GLM and other engines where parsnip stores a generic formula
internally.
h3sdm_aoa() now uses na.rm = TRUE when
computing the inside/outside AOA summary, avoiding NA in
the progress message when hexagons have missing values.
h3sdm_pa() now transforms presence records to the CRS
of the H3 grid before joining, fixing an error when the grid is in a
projected CRS.h3sdm_aoa() estimates the Dissimilarity Index (DI) and
the Area of Applicability (AOA) for spatial prediction models, based on
Meyer & Pebesma (2021).h3sdm_get_grid() now preserves the CRS of the input
sf_object. Previously, the function always returned the
grid in WGS84 (EPSG:4326) regardless of the input CRS. Now, if the AOI
is in a projected CRS, the output grid will be reprojected to match it.
The internal H3 computation still uses WGS84 as required by the H3
system.h3sdm_predict() internal comments translated to English
and @seealso updated to include
h3sdm_aoa().h3sdm_fit_model() now automatically detects model
mode (classification or regression), enabling count-based models
(Poisson, Negative Binomial) with appropriate metrics (RMSE, R2, MAE)
without requiring manual configuration.
h3sdm_fit_model() and h3sdm_predict()
now automatically detect model mode (classification or regression),
enabling count-based models (Poisson, Negative Binomial) without manual
configuration. Full backward compatibility maintained.
h3sdm_get_records() now supports
"biodatacr" as an optional provider, querying occurrence
records from BiodataCR (Costa Rica) via the rbiodatacr
package. h3sdm_pa() inherits this support automatically
through its providers argument. rbiodatacr is
listed as a suggested dependency.
h3sdm_pa_from_records(): generates a
presence/pseudo-absence dataset from user-provided records. Accepts a
data.frame or sf object with coordinates in
WGS84 (EPSG:4326). Supports optional filtering by a
geospatialKosher column to remove records with questionable
spatial quality.
h3sdm_count_from_records(): generates a hexagonal
grid with count-based response variables (species richness, total
detections, or individual abundance) from user-provided records. Accepts
a data.frame or sf object. Supports optional
filtering by presence column, confidence threshold, and date
range.
h3sdm_recipe(): added response_col
parameter (default "presence") to support count-based
response variables. Use response_col = "count" when working
with data generated by h3sdm_count_from_records().
h3sdm_recipe_gam(): added response_col
parameter (default "presence") with the same behavior as
h3sdm_recipe(). Also added documentation examples for both
presence/absence and count-based models.
h3sdm_workflow_gam(): updated documentation to
clarify the use of set_mode("classification") for
presence/absence models and set_mode("regression") with
family = poisson() for count-based models.
h3sdm_workflow(): updated documentation to clarify
model mode selection for presence/absence and count-based
models.
h3sdm_workflows(): updated documentation to clarify
model mode selection for presence/absence and count-based
models.