Introduction to surveysd

2026-03-16

The goal of surveysd is to combine all necessary steps to use calibrated bootstrapping with custom estimation functions. This vignette will cover the usage of the most important functions. For insights in the theory used in this package, refer to vignette("methodology").

Load dummy data

A test data set based on data(eusilc, package = "laeken") can be created with demo.eusilc()

library(surveysd)

set.seed(1234)
eusilc <- demo.eusilc(n = 2, prettyNames = TRUE)

eusilc[1:5, .(year, povertyRisk, gender, pWeight)]
year povertyRisk gender pWeight
2010 FALSE female 504.5696
2010 FALSE male 504.5696
2010 FALSE male 504.5696
2010 FALSE female 493.3824
2010 FALSE male 493.3824

Draw bootstrap replicates

Use stratified resampling without replacement to generate 10 samples. Those samples are consistent with respect to the reference periods.

dat_boot <- draw.bootstrap(eusilc, REP = 10, hid = "hid", weights = "pWeight", 
                           strata = "region", period = "year")

Calibrate bootstrap replicates

Calibrate each sample according to the distribution of gender (on a personal level) and region (on a household level).

dat_boot_calib <- recalib(dat_boot, conP.var = "gender", conH.var = "region",
                          epsP = 1e-2, epsH = 2.5e-2, verbose = FALSE)
dat_boot_calib[1:5, .(year, povertyRisk, gender, pWeight, w1, w2, w3, w4)]
year povertyRisk gender pWeight w1 w2 w3 w4
2010 FALSE female 504.5696 1008.6905620 0.4468999 0.4486785 0.4539311
2010 FALSE male 504.5696 1008.6905620 0.4468999 0.4486785 0.4539311
2010 FALSE male 504.5696 1008.6905620 0.4468999 0.4486785 0.4539311
2010 FALSE female 493.3824 0.4387304 0.4373870 0.4387304 0.4439256
2010 FALSE male 493.3824 0.4387304 0.4373870 0.4387304 0.4439256

Estimate with respect to a grouping variable

Estimate relative amount of persons at risk of poverty per period and gender.

err.est <- calc.stError(dat_boot_calib, var = "povertyRisk", fun = weightedRatio, group = "gender")
err.est$Estimates
year n N gender estimate_type val_povertyRisk stE_povertyRisk
2010 7267 3979572 male direct 12.02660 0.3191754
2010 7560 4202650 female direct 16.73351 0.4988794
2010 14827 8182222 NA direct 14.44422 0.3712952
2011 7267 3979572 male direct 12.81921 0.3387604
2011 7560 4202650 female direct 16.62488 0.5202533
2011 14827 8182222 NA direct 14.77393 0.4025330

The output contains estimates (val_povertyRisk) as well as standard errors (stE_povertyRisk) measured in percent. The rows with gender = NA denotes the aggregate over all genders for the corresponding year.

Estimate with respect to several variables

Estimate relative amount of persons at risk of poverty per period for each region, gender, and combination of both.

group <- list("gender", "region", c("gender", "region"))
err.est <- calc.stError(dat_boot_calib, var = "povertyRisk", fun = weightedRatio, group = group)
head(err.est$Estimates)
year n N gender region estimate_type val_povertyRisk stE_povertyRisk
2010 261 122741.8 male Burgenland direct 17.414524 3.537922
2010 288 137822.2 female Burgenland direct 21.432598 2.607941
2010 359 182732.9 male Vorarlberg direct 12.973259 2.565622
2010 374 194622.1 female Vorarlberg direct 19.883637 3.457225
2010 440 253143.7 male Salzburg direct 9.156964 0.831592
2010 484 282307.3 female Salzburg direct 17.939382 1.954899
## skipping 54 more rows