h3sdm 0.1.7

Improvements

h3sdm_aoa() no longer depends on vip. Variable importance for ranger and xgboost models is now extracted directly via ranger::importance() and xgboost::xgb.importance(), removing an external dependency after vip was archived from CRAN on 2026-07-13. Behavior is unchanged: importance weighting still uses each engine’s native (non-permutation) importance measure.
h3sdm_workflow() and h3sdm_workflows() now warn when a ranger model spec is created without an importance mode (e.g. set_engine("ranger", importance = "impurity")). Without it, h3sdm_aoa() silently falls back to equal variable weights instead of weighting by native importance.

h3sdm_filter_range() filters spatial predictions outside the univariate range of training data. Adds a range_filter column (1 = within range, 0 = outside range) to the prediction object. Complements h3sdm_aoa() and h3sdm_filter_outliers(): while AOA detects combinations without analogues in the training data and Mahalanobis distance identifies multivariate outliers, this function detects extrapolation at the margins of individual variables that multivariate methods may not capture.
h3sdm_pres_from_sf() assigns pre-downloaded species occurrence records (an sf object, typically from h3sdm_get_records()) to H3 hexagons, returning only hexagons with at least one presence record. This enables a clean two-stage workflow where records are downloaded once and reused: h3sdm_get_records() → h3sdm_pres_from_sf() → h3sdm_filter_outliers() → h3sdm_pa() with balanced pseudo-absences (n_pseudoabs = nrow(pres_clean)).

cr_outline – Costa Rica full outline (continental landmass + Isla del Coco and all minor oceanic islands), derived from GADM 4.1.

h3sdm_filter_outliers() removes environmental outliers from presence records prior to model training using Mahalanobis distance (D2) in environmental space. Only presences (presence == "1") are evaluated; pseudo-absences are always retained unchanged. The outlier threshold is derived from the chi-squared distribution (qchisq(threshold, df = k), default threshold = 0.975). Returns a list with the cleaned PA dataset, a data frame of removed records with their D2 values, the count of removed records, and the threshold value used. Complements h3sdm_aoa(): while the AOA evaluates prediction reliability after training, this function improves input data quality before training.
h3sdm_pres() assigns species occurrence records to H3 hexagons and returns only hexagons with at least one presence record. This is the first step of a two-stage workflow where pseudo-absences are generated after environmental variables have been extracted.
h3sdm_pa() has been redesigned to generate pseudo-absences stratified in environmental space using k-means clustering. Pseudo-absences now cover the full range of environmental conditions available in the AOI, reducing environmental bias introduced by spatially clustered occurrence records. The function now receives presence hexagons from h3sdm_pres() and the full hexagonal grid with extracted variables from h3sdm_predictors().

cr_outline_c dataset regenerated from GADM 4.1 with a fully reproducible script in data-raw/cr_outline.R. Source attribution updated to GADM 4.1. Geometry is now consistent with the cr_outline_c dataset in paisaje.
h3sdm_pa() and h3sdm_pa_from_records() now accept a buffer_k argument (default 1). Hexagons within buffer_k H3 rings of any presence hexagon are excluded from the pseudo-absence candidate pool, preventing pseudo-absences from being placed in areas likely occupied but not yet recorded. Set to 0 to disable.
h3sdm_pa_from_records() now accepts an optional predictors_sf argument. When provided, pseudo-absences are selected by stratified sampling in environmental space using k-means clustering. If NULL (default), the previous random geographic sampling behaviour is preserved.

h3sdm_aoa() now extracts predictor variable names from the model recipe instead of the model formula, fixing an error with GLM and other engines where parsnip stores a generic formula internally.
h3sdm_aoa() now uses na.rm = TRUE when computing the inside/outside AOA summary, avoiding NA in the progress message when hexagons have missing values.

h3sdm_pa() now transforms presence records to the CRS of the H3 grid before joining, fixing an error when the grid is in a projected CRS.

h3sdm_aoa() estimates the Dissimilarity Index (DI) and the Area of Applicability (AOA) for spatial prediction models, based on Meyer & Pebesma (2021).

h3sdm_get_grid() now preserves the CRS of the input sf_object. Previously, the function always returned the grid in WGS84 (EPSG:4326) regardless of the input CRS. Now, if the AOI is in a projected CRS, the output grid will be reprojected to match it. The internal H3 computation still uses WGS84 as required by the H3 system.
h3sdm_predict() internal comments translated to English and @seealso updated to include h3sdm_aoa().

h3sdm_fit_model() now automatically detects model mode (classification or regression), enabling count-based models (Poisson, Negative Binomial) with appropriate metrics (RMSE, R2, MAE) without requiring manual configuration.
h3sdm_fit_model() and h3sdm_predict() now automatically detect model mode (classification or regression), enabling count-based models (Poisson, Negative Binomial) without manual configuration. Full backward compatibility maintained.
h3sdm_get_records() now supports "biodatacr" as an optional provider, querying occurrence records from BiodataCR (Costa Rica) via the rbiodatacr package. h3sdm_pa() inherits this support automatically through its providers argument. rbiodatacr is listed as a suggested dependency.

h3sdm_pa_from_records(): generates a presence/pseudo-absence dataset from user-provided records. Accepts a data.frame or sf object with coordinates in WGS84 (EPSG:4326). Supports optional filtering by a geospatialKosher column to remove records with questionable spatial quality.
h3sdm_count_from_records(): generates a hexagonal grid with count-based response variables (species richness, total detections, or individual abundance) from user-provided records. Accepts a data.frame or sf object. Supports optional filtering by presence column, confidence threshold, and date range.

h3sdm_recipe(): added response_col parameter (default "presence") to support count-based response variables. Use response_col = "count" when working with data generated by h3sdm_count_from_records().
h3sdm_recipe_gam(): added response_col parameter (default "presence") with the same behavior as h3sdm_recipe(). Also added documentation examples for both presence/absence and count-based models.
h3sdm_workflow_gam(): updated documentation to clarify the use of set_mode("classification") for presence/absence models and set_mode("regression") with family = poisson() for count-based models.
h3sdm_workflow(): updated documentation to clarify model mode selection for presence/absence and count-based models.
h3sdm_workflows(): updated documentation to clarify model mode selection for presence/absence and count-based models.