Usage of the sdcHierarchies-Package

Bernhard Meindl

2026-03-19

Introduction

The sdcHierarchies package allows to create, modify and export nested hierarchies that are used for example to define tables in statistical disclosure control (SDC) software such as sdcTable.

Usage

Before using, the package needs to be loaded:

library(sdcHierarchies)

Create and modify a hierarchy from scratch

hier_create() allows to create a hierarchy. Argument root specifies the name of the root node. Optionally, it is possible to add some nodes to the top-level by listing their names in argument node_labs. Also, hier_display() shows the hierarchical structure of the current tree:

h <- hier_create(root = "Total", nodes = LETTERS[1:5])
hier_display(h)
## Total
## ├─A
## ├─B
## ├─C
## ├─D
## └─E

Once such an object is created, it can be modified by the following functions:

These functions can be applied as shown below:

## adding nodes below the node specified in argument `node`
h <- hier_add(h, root = "A", nodes = c("a1", "a2"))
h <- hier_add(h, root = "B", nodes = c("b1", "b2"))
h <- hier_add(h, root = "b1", nodes = c("b1_a", "b1_b"))

# deleting one or more nodes from the hierarchy
h <- hier_delete(h, nodes = c("a1", "b2"))
h <- hier_delete(h, nodes = c("a2"))

# rename nodes
h <- hier_rename(h, nodes = c("C" = "X", "D" = "Y"))
hier_display(h)
## Total
## ├─A
## ├─B
## │ └─b1
## │   ├─b1_a
## │   └─b1_b
## ├─X
## ├─Y
## └─E

We note that the underlying data.tree package allows to modify the objects on reference so no explicit assignment is required.

Information about nodes

Function hier_info() returns metadata for specific nodes provided in the nodes argument. If this argument is omitted, the function returns information for all nodes in the hierarchy.

# about a specific node
info <- hier_info(h, nodes = c("b1", "E"))

info is a named list where each list element refers to a queried node. The results for level b1 could be extracted as shown below:

info$b1
## $name
## [1] "b1"
## 
## $is_rootnode
## [1] FALSE
## 
## $level
## [1] 3
## 
## $is_leaf
## [1] FALSE
## 
## $siblings
## character(0)
## 
## $contributing_codes
## [1] "b1_a" "b1_b"
## 
## $children
## [1] "b1_a" "b1_b"
## 
## $parent
## [1] "B"
## 
## $is_bogus
## [1] TRUE
## 
## $parent_bogus
## [1] "B"

Convert to other formats

Function hier_convert() takes a hierarchy and allows to convert the network based structure to different formats while hier_export() does the conversion and writes the results to a file on the disk. The following formats are currently supported:

# conversion to a "@;label"-based format
res_df <- hier_convert(h, as = "df")
print(res_df)
##   level  name
## 1     @ Total
## 2    @@     A
## 3    @@     B
## 4   @@@    b1
## 5  @@@@  b1_a
## 6  @@@@  b1_b
## 7    @@     X
## 8    @@     Y
## 9    @@     E

The required code to create this hierarchy could be computed using:

code <- hier_convert(h, as = "code"); cat(code, sep = "\n")
## library(sdcHierarchies)
## tree <- hier_create(root = 'Total', nodes = c('A', 'B', 'X', 'Y', 'E'))
## tree <- hier_add(tree = tree, root = 'B', nodes = 'b1')
## tree <- hier_add(tree = tree, root = 'b1', nodes = c('b1_a', 'b1_b'))
## print(tree)

Using hier_export(), one can write the results to a file. This is for example useful if one wants to create hrc-files that could be used as input for \(\tau\)-argus which can be achieved as follows:

hier_export(h, as = "argus", path = file.path(tempdir(), "hierarchy.hrc"))

Create a hierarchy from different sources

hier_import() returns a network-based hierarchy given either a data.frame (in @;labs-format), json, code or from a \(\tau\)-argus compatible hrc-file. For example, if we want to create a hierarchy based on res_df:

n_df <- hier_import(inp = res_df, from = "df")
hier_display(n_df)
## Total
## ├─A
## ├─B
## │ └─b1
## │   ├─b1_a
## │   └─b1_b
## ├─X
## ├─Y
## └─E

Using hier_import(inp = "hierarchy.hrc", from = "argus") one could create a sdc hierarchy object directly from a hrc-file.

Create/Compute hierarchies from a string

Often it is the case, the the nested hierarchy information in encoded in a string. Function hier_compute() allows to transform such strings into hierarchy objects. One can distinguish two cases: The first case is where all input codes have the same length while in the latter case the length of the codes differs. Let’s assume we have a geographic code given in geo_m where digits 1-2 refer to the first level, digit 3 to the second and digits 4-5 to the third level of the hierarchy.

geo_m <- c(
  "01051", "01053", "01054", "01055", "01056", "01057", "01058", "01059", "01060", "01061", "01062",
  "02000",
  "03151", "03152", "03153", "03154", "03155", "03156", "03157", "03158", "03251", "03252", "03254", "03255",
  "03256", "03257", "03351", "03352", "03353", "03354", "03355", "03356", "03357", "03358", "03359", "03360",
  "03361", "03451", "03452", "03453", "03454", "03455", "03456",
  "10155")

Often, hierarchical information is encoded within character strings (e.g., geographic or sector codes). The hier_compute() function allows you to transform such vectors into hierarchy objects. The method argument provides two ways to define how these levels are encoded:

If the overall total is not explicitly encoded in the input strings, the root argument can be used to provide a name for the top-level node. Additionally, the as parameter specifies the output format. For example, setting as = "df" returns the result as a data.frame in the @; label format.

As shown below, these two methods are interchangeable and yield identical hierarchies:

# Using end positions (e.g., level 1 ends at index 2, level 2 at 3, level 3 at 5)
v1 <- hier_compute(
  inp = geo_m, 
  dim_spec = c(2, 3, 5), 
  root = "Tot", 
  method = "endpos", 
  as = "df"
)

# Using lengths (e.g., level 1 is 2 chars, level 2 is 1 char, level 3 is 2 chars)
v2 <- hier_compute(
  inp = geo_m, 
  dim_spec = c(2, 1, 2), 
  root = "Tot", 
  method = "len",
  as = "df"
)

identical(v1, v2)
## [1] TRUE
hier_display(v1)
## Tot
## ├─01
## │ └─010
## │   ├─01051
## │   ├─01053
## │   ├─01054
## │   ├─01055
## │   ├─01056
## │   ├─01057
## │   ├─01058
## │   ├─01059
## │   ├─01060
## │   ├─01061
## │   └─01062
## ├─02
## │ └─020
## │   └─02000
## ├─03
## │ ├─031
## │ │ ├─03151
## │ │ ├─03152
## │ │ ├─03153
## │ │ ├─03154
## │ │ ├─03155
## │ │ ├─03156
## │ │ ├─03157
## │ │ └─03158
## │ ├─032
## │ │ ├─03251
## │ │ ├─03252
## │ │ ├─03254
## │ │ ├─03255
## │ │ ├─03256
## │ │ └─03257
## │ ├─033
## │ │ ├─03351
## │ │ ├─03352
## │ │ ├─03353
## │ │ ├─03354
## │ │ ├─03355
## │ │ ├─03356
## │ │ ├─03357
## │ │ ├─03358
## │ │ ├─03359
## │ │ ├─03360
## │ │ └─03361
## │ └─034
## │   ├─03451
## │   ├─03452
## │   ├─03453
## │   ├─03454
## │   ├─03455
## │   └─03456
## └─10
##   └─101
##     └─10155

If the total is already contained within the string (for example, in the first 3 positions), the hierarchy can be computed by including that segment in the dim_spec and omitting the root argument:

geo_m_with_tot <- paste0("Tot", geo_m)
head(geo_m_with_tot)
## [1] "Tot01051" "Tot01053" "Tot01054" "Tot01055" "Tot01056" "Tot01057"
v3 <- hier_compute(
  inp = geo_m_with_tot, 
  dim_spec = c(3, 2, 1, 2), 
  method = "len"
)
hier_display(v3)
## Tot
## ├─01
## │ └─010
## │   ├─01051
## │   ├─01053
## │   ├─01054
## │   ├─01055
## │   ├─01056
## │   ├─01057
## │   ├─01058
## │   ├─01059
## │   ├─01060
## │   ├─01061
## │   └─01062
## ├─02
## │ └─020
## │   └─02000
## ├─03
## │ ├─031
## │ │ ├─03151
## │ │ ├─03152
## │ │ ├─03153
## │ │ ├─03154
## │ │ ├─03155
## │ │ ├─03156
## │ │ ├─03157
## │ │ └─03158
## │ ├─032
## │ │ ├─03251
## │ │ ├─03252
## │ │ ├─03254
## │ │ ├─03255
## │ │ ├─03256
## │ │ └─03257
## │ ├─033
## │ │ ├─03351
## │ │ ├─03352
## │ │ ├─03353
## │ │ ├─03354
## │ │ ├─03355
## │ │ ├─03356
## │ │ ├─03357
## │ │ ├─03358
## │ │ ├─03359
## │ │ ├─03360
## │ │ └─03361
## │ └─034
## │   ├─03451
## │   ├─03452
## │   ├─03453
## │   ├─03454
## │   ├─03455
## │   └─03456
## └─10
##   └─101
##     └─10155

The result is identical to v1 and v2. hier_compute() is also robust enough to handle input strings of varying lengths:

## Example with unequal string lengths; overall total provided via 'root'
yae_h <- c(
  "1.1.1.", "1.1.2.",
  "1.2.1.", "1.2.2.", "1.2.3.", "1.2.4.", "1.2.5.", "1.3.1.",
  "1.3.2.", "1.3.3.", "1.3.4.", "1.3.5.",
  "1.4.1.", "1.4.2.", "1.4.3.", "1.4.4.", "1.4.5.",
  "1.5.", "1.6.", "1.7.", "1.8.", "1.9.", "2.", "3.")

v1 <- hier_compute(
  inp = yae_h, 
  dim_spec = c(2, 2, 2), 
  root = "Tot", 
  method = "len"
)
hier_display(v1)
## Tot
## ├─1.
## │ ├─1.1.
## │ │ ├─1.1.1.
## │ │ └─1.1.2.
## │ ├─1.2.
## │ │ ├─1.2.1.
## │ │ ├─1.2.2.
## │ │ ├─1.2.3.
## │ │ ├─1.2.4.
## │ │ └─1.2.5.
## │ ├─1.3.
## │ │ ├─1.3.1.
## │ │ ├─1.3.2.
## │ │ ├─1.3.3.
## │ │ ├─1.3.4.
## │ │ └─1.3.5.
## │ ├─1.4.
## │ │ ├─1.4.1.
## │ │ ├─1.4.2.
## │ │ ├─1.4.3.
## │ │ ├─1.4.4.
## │ │ └─1.4.5.
## │ ├─1.5.
## │ ├─1.6.
## │ ├─1.7.
## │ ├─1.8.
## │ └─1.9.
## ├─2.
## └─3.

Creating hierarchies from a list

Alternatively, you can create a hierarchy by setting method = "list". In this mode, the input should be a named list where each element’s name is interpreted as a parent node, and the element’s content represents its child nodes.

yae_ll <- list()
yae_ll[["Total"]] <- c("1.", "2.", "3.")
yae_ll[["1."]]    <- paste0("1.", 1:9, ".")
yae_ll[["1.1."]]  <- paste0("1.1.", 1:2, ".")
yae_ll[["1.2."]]  <- paste0("1.2.", 1:5, ".")
yae_ll[["1.3."]]  <- paste0("1.3.", 1:5, ".")
yae_ll[["1.4."]]  <- paste0("1.4.", 1:6, ".")

d <- hier_compute(inp = yae_ll, root = "Total", method = "list") 
## Argument 'dim_spec' is ignored when constructing a hierarchy from a nested list.
hier_display(d)
## Total
## ├─1.
## │ ├─1.1.
## │ │ ├─1.1.1.
## │ │ └─1.1.2.
## │ ├─1.2.
## │ │ ├─1.2.1.
## │ │ ├─1.2.2.
## │ │ ├─1.2.3.
## │ │ ├─1.2.4.
## │ │ └─1.2.5.
## │ ├─1.3.
## │ │ ├─1.3.1.
## │ │ ├─1.3.2.
## │ │ ├─1.3.3.
## │ │ ├─1.3.4.
## │ │ └─1.3.5.
## │ ├─1.4.
## │ │ ├─1.4.1.
## │ │ ├─1.4.2.
## │ │ ├─1.4.3.
## │ │ ├─1.4.4.
## │ │ ├─1.4.5.
## │ │ └─1.4.6.
## │ ├─1.5.
## │ ├─1.6.
## │ ├─1.7.
## │ ├─1.8.
## │ └─1.9.
## ├─2.
## └─3.

Grids and Indexing

The hier_grid() function computes all possible combinations of codes from multiple hierarchies. This is a crucial step in building complete tables for Statistical Disclosure Control (SDC).

Handling Bogus Codes

A “bogus” chain occurs when a parent node has only a single child. In such cases, the parent and the child represent the same set of underlying units, which can cause redundancies in SDC software. In the example below, both h1 and h2 contain bogus structures:

  • In h1, the node A has only one child a1, which in turn has only one child aa1.
  • In h2, the nodes b and d each have only a single child (b1 and d1 respectively).
h1 <- hier_create("Total", nodes = LETTERS[1:3])
h1 <- hier_add(h1, root = "A", node = "a1")
h1 <- hier_add(h1, root = "a1", node = "aa1")
hier_display(h1)
## Total
## ├─A
## │ └─a1
## │   └─aa1
## ├─B
## └─C
h2 <- hier_create("Total", letters[1:5])
h2 <- hier_add(h2, root = "b", node = "b1")
h2 <- hier_add(h2, root = "d", node = "d1")
hier_display(h2)
## Total
## ├─a
## ├─b
## │ └─b1
## ├─c
## ├─d
## │ └─d1
## └─e

When calling hier_grid(), setting add_dups = FALSE automatically prunes these redundant parent nodes (like A, a1, b, and d). They are replaced by their most granular descendants (e.g., aa1, b1, and d1), ensuring the resulting grid aligns with the granularity of the raw microdata.

# cell_id is a unique string created by concatenating default codes
r <- hier_grid(h1, h2, add_dups = FALSE, add_levs = TRUE)
print(r)
##         v1     v2 cell_id levs_v1 levs_v2 leaf_id
##     <char> <char>  <char>   <int>   <int>   <int>
##  1:  Total  Total 0000000       1       1      NA
##  2:    aa1  Total 0111000       4       1      NA
##  3:      B  Total 0200000       2       1      NA
##  4:      C  Total 0300000       2       1      NA
##  5:  Total      a 0000010       1       2      NA
##  6:    aa1      a 0111010       4       2       3
##  7:      B      a 0200010       2       2       1
##  8:      C      a 0300010       2       2       2
##  9:  Total     b1 0000021       1       3      NA
## 10:    aa1     b1 0111021       4       3       6
## 11:      B     b1 0200021       2       3       4
## 12:      C     b1 0300021       2       3       5
## 13:  Total      c 0000030       1       2      NA
## 14:    aa1      c 0111030       4       2       9
## 15:      B      c 0200030       2       2       7
## 16:      C      c 0300030       2       2       8
## 17:  Total     d1 0000041       1       3      NA
## 18:    aa1     d1 0111041       4       3      12
## 19:      B     d1 0200041       2       3      10
## 20:      C     d1 0300041       2       3      11
## 21:  Total      e 0000050       1       2      NA
## 22:    aa1      e 0111050       4       2      15
## 23:      B      e 0200050       2       2      13
## 24:      C      e 0300050       2       2      14
##         v1     v2 cell_id levs_v1 levs_v2 leaf_id
##     <char> <char>  <char>   <int>   <int>   <int>

High-Performance Indexing

For large datasets, mapping microdata strings to grid cells using character matching is computationally expensive. By setting add_contributing_cells = TRUE, sdcHierarchies generates an optimized integer-based indexing system:

  1. leaf_id: A unique integer assigned to every combination of base-level codes (the most granular codes in the hierarchies).
  2. contributing_leaf_ids: A list-column containing the integers of all base-level codes that contribute to a specific cell (e.g., all codes falling under a “Total” or “Sub-total”).
# Create an SDC-optimized grid
r_sdc <- hier_grid(h1, h2, add_dups = FALSE, add_contributing_cells = TRUE)

# Genrate microdata using base-level codes for region and sector
# Note: 'aa1', 'b1', and 'd1' are the granular leaf nodes
microdata <- data.table(
  region = c("aa1", "B", "C", "aa1", "B"),
  sector = c("a", "b1", "c", "d1", "e"),
  val = c(10, 20, 30, 40, 50)
)

# Map microdata to base-level IDs using a named list
microdata[, leaf_id := hier_create_ids(
  data = microdata, 
  dims = list("region" = h1, "sector" = h2)
)]

print(microdata)
##    region sector   val leaf_id
##    <char> <char> <num>   <int>
## 1:    aa1      a    10       3
## 2:      B     b1    20       4
## 3:      C      c    30       8
## 4:    aa1     d1    40      12
## 5:      B      e    50      13
# Fast aggregation: Summing 'Total_Total' using integer lookups
total_ids <- r_sdc[v1 == "Total" & v2 == "Total", contributing_leaf_ids[[1]]]
print(total_ids)
##  [1]  1  2  3  7  8  9 13 14 15  4  5  6 10 11 12
sum(microdata[leaf_id %in% total_ids, val])
## [1] 150

Differentiating Totals and Primary Cells

The leaf_id column serves as a built-in classifier to distinguish between different cell types in the grid:

  • Primary Cells: If leaf_id contains an integer, the row represents a unique combination of base-level codes. These are the “internal” cells where microdata is directly mapped.
  • (Sub)-Totals: If leaf_id is NA, the row represents an aggregate cell.

This allows for extremely fast filtering during SDC tasks, such as isolating primary cells for sensitivity testing:

# Isolate primary cells for primary suppression
primary_cells <- r_sdc[!is.na(leaf_id)]

# Isolate aggregate cells for marginal consistency checks
sub_totals <- r_sdc[is.na(leaf_id)]

Interactively Create or Modify Hierarchies

The sdcHierarchies package includes a Shiny-based interactive application accessible via hier_app(). This interface is designed for users who prefer a visual approach to building or refining complex structures.

The app accepts either a raw character vector (to be converted using hier_compute() logic) or an existing hierarchy object. For example, to modify the hierarchy created in the previous section:

# Start the app and store the modified result upon closing
d_modified <- hier_app(d)

Key Features of the Interactive App

  • Visual Construction: If a character vector is passed, the app provides a guided interface to specify dim_spec and method arguments.
  • Drag-and-Drop Editing: Once the tree is loaded, you can dynamically restructure the hierarchy by dragging nodes to new parent locations.
  • Node Management: Easily add, remove, or rename nodes through the sidebar controls.
  • Live Code Generation: The R code required to reproduce the current state of the hierarchy is updated in real-time and can be copied or saved.
  • Export Options: Supports exporting the final hierarchy directly back to the R session, saving it as a JSON file, or generating a \(\tau\)-argus compatible hrc file.
  • Undo Functionality: A built-in history allows you to revert recent changes during the editing process.

Because hier_app() returns the modified hierarchy object upon closing, it is recommended to assign the function call to an object (as shown above) to capture your interactive changes for further use in your SDC pipeline.

Summary

The sdcHierarchies package provides a robust framework for hierarchical data management. In case you have any suggestions or improvements, please feel free to file an issue at our issue tracker or contribute by filing a pull request.