Getting Started

Introduction

This package adds resampling methods for the {mlr3} package framework suited for spatial, temporal and spatiotemporal data. These methods can help to reduce the influence of autocorrelation on performance estimates when performing cross-validation. While this article gives a rather technical introduction to the package, a more applied approach can be found in the mlr3book section on “Spatiotemporal Analysis”.

After loading the package via library("mlr3spatiotempcv"), the spatiotemporal resampling methods and example tasks provided by {mlr3spatiotempcv} are available to the user alongside the default {mlr3} resampling methods and tasks.

Creating a spatial Task

To make use of spatial resampling methods, a {mlr3} task that is aware of its spatial characteristic needs to be created. Two Task child classes exist in {mlr3spatiotempcv} for this purpose:

To create one of these, you have multiple options:

  1. Use the constructor of the Task directly via $new() - this only works for data.table backends (!)
  2. Use the as_task_* converters (e.g. if your data is stored in an sf object)

We recommend the latter, as the as_task_* converters aim to make task construction easier, e.g., by creating the DataBackend (which is required to create a Task in {mlr3}) automatically and setting the crs and coordinate_names fields. Let’s assume your (point) data is stored in with an sf object, which is a common scenario for spatial analysis in R.

# create 'sf' object
data_sf = sf::st_as_sf(ecuador, coords = c("x", "y"), crs = 32717)

# create `TaskClassifST` from `sf` object
task = as_task_classif_st(data_sf, id = "ecuador_task", target = "slides", positive = "TRUE")

You can also use a plain data.frame. In this case, crs and coordinate_names need to be passed along explicitly as they cannot be inferred directly from the sf object:

task = as_task_classif_st(ecuador, id = "ecuador_task", target = "slides",
  positive = "TRUE", coordinate_names = c("x", "y"), crs = 32717)

The *ST task family prints a subset of the coordinates by default:

print(task)
#> <TaskClassifST:ecuador_task> (751 x 11)
#> * Target: slides
#> * Properties: twoclass
#> * Features (10):
#>   - dbl (10): carea, cslope, dem, distdeforest, distroad,
#>     distslidespast, hcurv, log.carea, slope, vcurv
#> * Coordinates:
#>             x       y
#>         <num>   <num>
#>   1: 712882.5 9560002
#>   2: 715232.5 9559582
#>   3: 715392.5 9560172
#>   4: 715042.5 9559312
#>   5: 715382.5 9560142
#>  ---                 
#> 747: 714472.5 9558482
#> 748: 713142.5 9560992
#> 749: 713322.5 9560562
#> 750: 715392.5 9557932
#> 751: 713802.5 9560862

All *ST tasks can be treated as their super class equivalents TaskClassif or TaskRegr in subsequent {mlr3} modeling steps.

Contributed reflections by {mlr3spatiotempcv}

In {mlr3}, dictionaries are used for overview purposes of available methods. The following sections show which dictionaries get appended with new entries when loading {mlr3spatiotempcv}.

Task Type

mlr_reflections$task_types
#> Key: <type>
#>            type          package             task        learner
#>          <char>           <char>           <char>         <char>
#> 1:      classif             mlr3      TaskClassif LearnerClassif
#> 2:   classif_st mlr3spatiotempcv    TaskClassifST LearnerClassif
#> 3:         regr             mlr3         TaskRegr    LearnerRegr
#> 4:      regr_st mlr3spatiotempcv       TaskRegrST    LearnerRegr
#> 5: unsupervised             mlr3 TaskUnsupervised        Learner
#>           prediction       prediction_data        measure
#>               <char>                <char>         <char>
#> 1: PredictionClassif PredictionDataClassif MeasureClassif
#> 2: PredictionClassif PredictionDataClassif MeasureClassif
#> 3:    PredictionRegr    PredictionDataRegr    MeasureRegr
#> 4:    PredictionRegr    PredictionDataRegr    MeasureRegr
#> 5:              <NA>                  <NA>           <NA>

Task Column Roles

mlr_reflections$task_col_roles
#> $regr
#> [1] "feature" "target"  "name"    "order"   "stratum" "group"   "weight" 
#> 
#> $classif
#> [1] "feature" "target"  "name"    "order"   "stratum" "group"   "weight" 
#> 
#> $unsupervised
#> [1] "feature" "name"    "order"  
#> 
#> $classif_st
#>  [1] "feature"    "target"     "name"       "order"      "stratum"   
#>  [6] "group"      "weight"     "coordinate" "space"      "time"      
#> 
#> $regr_st
#>  [1] "feature"    "target"     "name"       "order"      "stratum"   
#>  [6] "group"      "weight"     "coordinate" "space"      "time"

Resampling Methods

and their respective repeated versions. See as.data.table(mlr_resamplings) for the full dictionary.

Examples Tasks

Upstream Packages and Scientific References

The following table lists all spatiotemporal methods implemented in {mlr3spatiotempcv} (or {mlr3}), their upstream R package and scientific references. All methods besides "spcv_buffer" also have a corresponding “repeated” method.

Category (Package) Method Name Reference mlr3 Notation
Buffering, spatial (blockCV) Spatial Buffering Valavi et al. (2018) mlr_resamplings_spcv_buffer
Buffering, spatial (sperrorest) Spatial Disc Brenning (2012) mlr_resamplings_spcv_disc
Blocking, spatial (blockCV) Spatial Blocking Valavi et al. (2018) mlr_resamplings_spcv_block
Blocking, spatial (sperrorest) Spatial Tiles Valavi et al. (2018) mlr_resamplings_spcv_tiles
Clustering, spatial (sperrorest) Spatial CV Brenning (2012) mlr_resamplings_spcv_coords
Clustering, spatial (CAST) KNNDM Linnenbrink et al. (2023) mlr_resamplings_spcv_knndm
Clustering, feature-space (blockCV) Environmental Blocking Valavi et al. (2018) mlr_resamplings_spcv_env




Grouping, predefined inds (mlr3) Predefined partitions mlr_resamplings_custom_cv
Grouping, spatiotemporal (mlr3) via col_roles "group" mlr_resamplings_cv, Task$set_col_roles(<variable>, "group")
Grouping, spatiotemporal (CAST) Leave-Location-and-Time-Out Meyer et al. (2018) mlr_resamplings_sptcv_cstf, Task$set_col_roles(<variable>, "space|time")

References

Brenning, Alexander. 2012. Spatial cross-validation and bootstrap for the assessment of prediction rules in remote sensing: The R package sperrorest.” In 2012 IEEE International Geoscience and Remote Sensing Symposium. IEEE. https://doi.org/10.1109/igarss.2012.6352393.
Linnenbrink, Jan, Carles Milà, Marvin Ludwig, and Hanna Meyer. 2023. kNNDM: K-Fold Nearest Neighbour Distance Matching Cross-Validation for Map Accuracy Estimation.” EGUsphere, July, 1–16. https://doi.org/10.5194/egusphere-2023-1308.
Meyer, Hanna, Christoph Reudenbach, Tomislav Hengl, Marwan Katurji, and Thomas Nauss. 2018. “Improving Performance of Spatio-Temporal Machine Learning Models Using Forward Feature Selection and Target-Oriented Validation.” Environmental Modelling & Software 101 (March): 1–9. https://doi.org/10.1016/j.envsoft.2017.12.001.
Valavi, Roozbeh, Jane Elith, Jose J. Lahoz-Monfort, and Gurutzeta Guillera-Arroita. 2018. blockCV: an R package for generating spatially or environmentally separated folds for k-fold cross-validation of species distribution models.” bioRxiv, June. https://doi.org/10.1101/357798.