Visualizing IP data

This vignette assumes an understanding of IP addresses and networks. Please consult vignette("ipaddress-classes", "ipaddress") for a very basic introduction.

Data visualization of the IP address space is challenging because there are so many unique addresses (approximately 4.3 billion for IPv4 and \(3.8 \times 10^{38}\) for IPv6). Owing to the hierarchical nature of address space, we must plot the addresses on a discrete scale (not a continuous scale). It’s simply not possible to display (or interpret) such a large number of discrete levels simultaneously.

There are a few actions we can take to improve the situation:

  1. Visualize a reduced number of discrete levels by:
    1. Showing only a subnetwork of the full address space (i.e. filtering leading bits)
    2. Limiting the resolution by summarizing data within networks (i.e. neglecting trailing bits)
  2. Transform the one-dimensional address space onto the two-dimensional plane

These are handled by the canvas_network, pixel_prefix and curve arguments of coord_ip(), respectively. This vignette describes these actions in more detail.

Reducing visualized information

As an example, consider the 32-bit representation of the IPv4 address 192.168.0.124. If we wanted to visualize this single address within the full context of the IPv4 address space, we’d need to simultaneously display \(2^{32}\) discrete levels (roughly 4.3 billion).

To reduce the visualized information, we could only show a subnetwork of the full address space. In our example, we could only display the 192.0.0.0/8 network using coord_ip(canvas_network = ip_network("192.0.0.0/8")). This would effectively filter addresses where the leading 8 bits match the specified network, thereby reducing the number of discrete levels to \(2^{24}\) (roughly 16.8 million).

Alternatively, we could make each discrete level represent a network of addresses. To do this, we’d need to use a summary function to reduce the network data to a single value. In our example, we could make each discrete level represent a network with a prefix length of 24 using coord_ip(pixel_prefix = 24). This would effectively neglect the trailing 8 bits of the 32-bit address, thereby further reducing the number of discrete levels to \(2^{16}\) (65,536).

These two techniques become even more important in the IPv6 address space, which uses 128-bit addresses.

Note: To prevent accidentally plotting an unreasonably large number of discrete levels, ggip limits the number of plotted bits to 24. This means the coord_ip() arguments must satisfy:

pixel_prefix - prefix_length(canvas_network) <= 24

Space-filling curves

Inspired by an xkcd comic originally published in December 2006, we use a space-filling curve to map IP data (one-dimensional) to Cartesian coordinates (two-dimensional). This means our discrete levels become represented by pixels. Two curves are commonly chosen for this task: the Hilbert curve and the Morton curve (also known as the Z curve). Compared to other space-filling curves, these are advantageous because they preserve locality (i.e. subnetworks remain close together).

The curve order represents how nested the curve is and therefore determines how many data points can be visualized. Conversely, choosing the number of plotted bits (see above) determines the order of the curve. Since space-filling curves are fractal, increasing the curve order effectively improves the image resolution (plotted networks remain in the same overall location).

Hilbert curve

IP data is most commonly displayed on a Hilbert curve because it has optimal locality preservation.

This curve starts in the top-left corner and ends in the top-right corner. It is chosen using coord_ip(curve = "hilbert").

Morton curve

The Morton curve technically offers slightly poorer locality preservation than the Hilbert curve. However, the discontinuous jumps in the curve actually correspond to crossing IP network boundaries. In this sense, the Morton curve is a more natural representation of the IP network structure. For example, the start and end addresses of a network are always located diagonally across from each other.

This curve starts in the top-left corner and ends in the bottom-right corner. It is chosen using coord_ip(curve = "morton").

Putting it all together

Finally, let’s consider a specific example.

coord_ip(
  canvas_network = ip_network("0.0.0.0/0"),
  pixel_prefix = 4,
  curve = "hilbert"
)

This coordinate system will use a 2nd order Hilbert curve to visualize the entire IPv4 address space, where each vertex represents a /4 network.