Consider a network of movements or exchanges between places. This is commonplace in socio-economic activities. For example: when you order something from Amazon, the movement of your package from one warehouse to the next is part of Amazon’s shipment network, or even part of the global shipment network. Our commutes to work in the morning can be considered a commute network between neighborhoods/cities/offices. Each of these cases can be considered as an specialized instance of the mathematical concept of ‘graph’ called spatial graph: a graph consisting of vertices with fixed locations, and arcs/edges connecting these vertices.
In social and economic sciences analysis of relations in such a network is very interesting, and with the recent availability and coverage of spatial network data, very useful for managerial planning in private firms and policy decisions such as urban planning in public agencies. On the other hand metrics concerning spatial aspects of networks are almost always problem specific and not general. Spatial Dispersion Index (SDI) is a generalized measurement index, or rather a family of indices to evaluate spatial distances of movements in a network in a problem neutral way, thus aims to address this problem. rSDI computes and optionally visualizes this index with minimal hassle:
library(rSDI)
SDI(TurkiyeMigration.flows, TurkiyeMigration.nodes, variant="vow") %>% plotSDI(variant="vow")
The core idea of the SDI index was conceived as part of a large scale government commissioned study report, in Turkish (Gençer et al. 2020), whose results are yearly updated with new data and is available as a live analysis at https://yersis.gov.tr/web. The SDI index was later was generalized and published on its own merit, and explained in detail in the paper (Gençer 2023).
rSDI package provides functions to compute SDI family of indices for spatial graphs in conjunction with its definition the paper (Gençer 2023). rSDI also provides some convenience functions to visualize SDI index measurements. While this is not its primary reason of existence it is often very practical for the user to have some preliminary visualization at arm’s length. In sections 2 and 3 below we first explain the concept of spatial networks and their data, then review mathematical graph formalism to represent spatial networks. Then we introduce the SDI index family’s calculation, its interpretation, and thumb rules for choosing an index for your analyses. The last two sections provide a run through of index calculation then visualization features of the rSDI package using an example data set provided by the package, on human migration between provinces of Turkiye.
Spatial networks are represented as a particular type of graph where the graph nodes (vertices) are fixed locations and each graph arc/edge represent a flow/relation between two of these nodes. In most real life cases these networks represent varying flows of people (e.g. transportation), good (e.g. trade, shopping), or information (e.g. Internet data transfer, phone call). Thus the graph is weighted and directed and has arcs, rather than edges. Also in most cases the network is geospatial. In geospatial networks the locations of vertices in the representing graph are, for example, cities, airports, etc., and are defined with their latitude and longitude. This is the case for most examples of movements related to trade, migration, education, services, etc. In other cases the spatial network may span a smaller space and is rather measured on its own Cartesian references; for example in the case of student movement on a campus, or movement of parts in a production facility. In those latter cases vertices (e.g. campus library, a welding station) have an x-y position defined with respect to a chosen corner or center of the campus, production facility, etc.
Spatial network data consists of two data frames: one representing the flows and the other detailing the locations, and possibly labels of nodes in the network. The following is a simple, imaginary spatial network data:
|
|
This spatial network is visualized below, showing node locations as well as flow amounts (weights) on lines representing edges:
A spatial network, \(N\), is represented with the mathematical concept of graph, which consists of vertices, \(V\), representing the locations/nodes in the spatial network and ties/edges, \(E\) representing flows tying them together into a network, thus \(N=(V,E)\). To capture a flow over an edge \(e_{ij}\) from vertex \(i\) to vertex \(j\) let us denote the amount of flow on the edge as edge weight \(w_{i\rightarrow j}\). In graph theoretic terms this corresponds to a directed and weighted graph.
To capture spatial aspects of the network let \(p_i=<x_i,y_i>\) and \(p_j=<x_j, y_j>\) denote locations of vertices \(i\) and \(j\), respectively, in some two dimensional space such as Cartesian or geographic locations. In the latter, the coordinates \(x\) and \(y\) would denote the longitude and latitude of a geographical location, respectively. One can now speak of a spatial distance, \(\delta_{ij}\), between any two vertices. In the case of geographical networks Haversine distance would be appropriate for determining spherical distances between two locations:
\[\begin{equation} \delta^{H}_{ij}=2R\arcsin\left(\sqrt{\sin^{2}\left({\frac{y_j-y_i}{2}}\right)+\cos(\varphi_{i})\cos(\varphi_{j})\sin^{2}\left({\frac{x_j-x_i}{2}}\right)}\right) \end{equation}\] Where \(R\) is the radius of the Earth, which is roughly \(6,371\) km.
In the case of a more local spatial network we would probably have Cartesian coordinates, e.g. x-y coordinates within a production plant, of which we analyse flows of parts between stations. In those cases an Euclidean distance can be used instead: \[\begin{equation} \delta^E_{ij}=\sqrt{(x_i-x_j)^2+(y_i-y_j)^2} \end{equation}\]
In our toy example from the previous section the Euclidean distances can be easily calculated (since it is a simple 3-4-5 triangle) for each edge as follows (defaulting to Euclidean distance for Middle Earth, since we have no latitude/longitude information about it):
from | to | weight | distance |
---|---|---|---|
A | B | 10 | 5 |
B | A | 20 | 5 |
A | C | 5 | 3 |
In order to quantify spatial reach of the flows in a spatial network, the spatial distance of two nodes should be incorporated with the flow between the nodes. The Spatial Dispersion Index here is a direct translation of this idea and is broadly defined as the weighted average distances the network flows span, wighted by flow amounts. The key idea was conceived by the author, explained thoroughly and put into use in a broader field study report (Gençer 2023). A brief discussion and definition is presented here.
SDI is a family of indices rather than a single index. The reason for its variants is related to differential research interests when analyzing spatial networks. Here we explain these variations. Further below we introduce the three letter, XXX, notation to symbolize corresponding SDI variants:
As an illustrative example, network level, weighted SDI index would be computed as follows 1: \[\begin{equation} \textrm{SDI}^w(N)=\frac{\sum_{i \rightarrow j \in E}{(w_{i\rightarrow j} \cdot \delta_{ij})}}{\sum_{i\rightarrow j \in E}{w_{i\rightarrow j}}} \end{equation}\] For our toy problem this could be computed as: \(\textrm{SDI}^w(N)=(10*5+20*5+5*3)/(10+20+5)\)
Whereas a node level, unweighted, out-flows only index would be computed by replacing all weights with 1s: \[\begin{equation} \textrm{SDI}^u_{+}(i)=\frac{\sum_{i\rightarrow j \in E}{(1 \cdot \delta_{ij})}}{\sum_{i\rightarrow j \in E}{1}} \end{equation}\] Which is simply the average of distances of the flows towards the focal node. For our toy problem’s node A, this can be computed as \(\textrm{SDI}^u_{+}(A)=(5+3)/2\)
Please consult the source paper, Gençer (2023), and help pages for an extensive description of index calculation for the above cases.
SDI computation uses a three letter index variant code to represent a variant of the index. The LDS code corresponds to usage of Level-Direction-and-Strength of network ties, respectively. For example an LDS code of “nuw” would mean a network level, undirected, and weighted SDI variant. Each part of the LDS code can take the following values:
rSDI functions consume an igraph object and return their output as an
igraph which has additional edge, vertex, and/or graph attributes. Let
us start with an example involving the helper function
dist_calc()
. This function is not neded to be called
explicitly in a normal workflow, but normally invoked by
SDI()
, the main entry point of SDI calculations. It
computes the distances between pairs of nodes which are connected by
each graph edge. The computed distances are returned as edge attributes
of the returned graph. Consider the following spatial network data
frames for the fictional spatial network above:
flows<-data.frame(from=c("A","B","A"), to=c("B","A","C"), weight=c(10,20,5))
nodes<-data.frame(id=c("A","B","C","D"),x=c(0,4,0,4),y=c(3,0,0,3))
library(igraph)
toyGraph <- graph_from_data_frame(flows, directed=TRUE, vertices=nodes)
The edges of the graph has only the ‘weight’ attribute:
#> [1] "weight"
rSDI’s main function is SDI()
. SDI()
function works in a similar fashion and adds its output as graph and
vertex attribute (in addition to computing and adding edge distance
attributes if they are missing, which is a prerequisite for all SDI
metrics):
toyGraphWithSDI <- SDI(toyGraph) #same as SDI(toyGraph, level="vertex", directionality="undirected", weight.use="weighted")
edge_attr_names(toyGraphWithSDI)
#> [1] "weight" "distance"
vertex_attr_names(toyGraphWithSDI)
#> [1] "name" "x" "y" "SDI_vuw"
To help its user follow the theoretical distinctions explained in the previous section, rSDI letter codes the index measurements it measures şn accordance with that classification. In the the example above, call to SDI function computes (1) vertex level, (2) undirected,, and (3) weighted SDI index, which are the defaults. Thus to each vertex of its input graph it adds and attribute named ‘SDI_vuw’. The attribute is added to each vertex even if the index cannot be computed. This is the case for vertex D which has an NA value stored in its ‘SDI_vuw’ attribute:
If the index is computed at the network level the vertices will not have additional attributes but the graph itself will, following the same convention:
toyGraphWithNetworkSDI <- SDI(toyGraph, level="network", directionality="undirected", weight.use="weighted")
graph_attr_names(toyGraphWithNetworkSDI)
#> [1] "SDI_nuw"
graph_attr(toyGraphWithNetworkSDI,"SDI_nuw")
#> [1] 4.714286
Once you are comfortable with this convention you can shorten your
calls to SDI()
using the ‘variant’ parameter as follows,
which is equivalent to the call in the example above:
SDI will leave previously computed indices untouched. Thus, for example, you can compute several indices in a pipe:
toyGraph %>%
SDI(variant="nuw") %>%
SDI(variant="niu") %>% # nuu?
SDI(variant="vuw") %>%
SDI(variant="vuu") -> toyGraphWithSeveralSDI
graph_attr_names(toyGraphWithSeveralSDI)
#> [1] "SDI_nuw" "SDI_nuu"
vertex_attr_names(toyGraphWithSeveralSDI)
#> [1] "name" "x" "y" "SDI_vuw" "SDI_vuu"
The same can be achieved by using a vector of variants in a single call:
toyGraphWithSeveralSDI <- SDI(toyGraph, variant=c("nuw","niu","vuw","vuu"))
graph_attr_names(toyGraphWithSeveralSDI)
#> [1] "SDI_nuw" "SDI_nuu"
vertex_attr_names(toyGraphWithSeveralSDI)
#> [1] "name" "x" "y" "SDI_vuw" "SDI_vuu"
Note that for the generalized SDI variant you must provide the additional \(\alpha\) parameter:
toyGraphWithGeneralizedSDI <- SDI(toyGraph, variant="vug", alpha=0.5)
vertex_attr_names(toyGraphWithGeneralizedSDI)
#> [1] "name" "x" "y" "SDI_vug"
vertex_attr(toyGraphWithGeneralizedSDI,"SDI_vug")
#> [1] 4.252907 4.472136 3.464102 NA
Calling the dist_calc()
helper function adds a distance
attribute to an input graph. This is automatically performed when
SDI()
is called, but you may facilitate it separately if
needed. For the example in the previous section the call is made as
follows:
toyGraphWithDistances <- dist_calc(toyGraph)
edge_attr_names(toyGraphWithDistances)
#> [1] "weight" "distance"
Having seen the coordinate attributes as ‘x’ and ‘y’ (rather than as ‘latitude’ and ‘longitude’) the function opts for a Euclidean distance calculation and returns the 3-4-5 triangle distances:
rSDI package comes with a real world data set consisting of two data
frames: TurkiyeMigration.flows
contains the data on
migration of people between Türkiye’s provinces in the period
2016-2017-2018, a consolidated version of raw data from Turkish
Statistical Institute. TurkiyeMigration.nodes
contains
labels and geographic coordinates (latitute&longitude) of
provinces:
head(TurkiyeMigration.flows)
#> from to weight
#> 1 TRC12 TR621 737.0000
#> 2 TR332 TR621 319.6667
#> 3 TRA21 TR621 213.0000
#> 4 TR712 TR621 412.6667
#> 5 TR834 TR621 158.3333
#> 6 TR510 TR621 2594.6667
head(TurkiyeMigration.nodes)
#> id label longitude latitude
#> 1 TR100 \\u0130stanbul 28.96711 41.00893
#> 2 TR211 Tekirda\\u011f 27.51167 40.97809
#> 3 TR212 Edirne 26.55596 41.67717
#> 4 TR213 K\\u0131rklareli 27.22437 41.73547
#> 5 TR221 Bal\\u0131kesir 27.88834 39.65046
#> 6 TR222 \\u00c7anakkale 26.40859 40.14672
You may call the SDI()
function either with an igraph
object you compose yourself from flow and node data, or directly giving
them to SDI, as follows:
TMSDI <- SDI(TurkiyeMigration.flows, TurkiyeMigration.nodes, variant="vuw")
# -- OR --
library(igraph)
TMgraph <- graph_from_data_frame(TurkiyeMigration.flows, directed=TRUE, TurkiyeMigration.nodes)
TMSDI <- SDI(TMgraph, variant="vuw")
rSDI plotting functions make use of available open map packages in
the R ecosystem to make a geographical plot of SDI measurements. The
plotSDI()
function produces a visualization where the
circles for each note has an area proportional to the node’s selected
SDI measure. The function will try to optimize the circle sizes as best
as it can, but you can customize circle sizes, fill colors, etc. by
overriding its parameters. For example you can scale the circles sizes
relative to its default as:
Please refer to documentation of plotSDI()
fur further
fine grained control of its plotting parameters.
You may want to visualize the network flows along with the SDI index measurements. This particular combination is provided as a convenience. You can turn on the displaying of network edges using the ‘edges’ argument to SDO plotter:
Please note that this combination is based on several graph visualization and geospatial packages. If you need a fine control over all these underlying visualization layers you are recommended to go for a home made solution using packages such as ggraph, sf, and naturalearth.
The visualization features of rSDI mainly leverages the fact that spatial graphs are often geospatial, and thus one can make use of geospatial libraries in R in combination with graph plotting to visualize these networks on a map. Current version of rSDI does not provide a capability to use your own map, for example when working with a network of flows within a production plant, a schoolyard, etc. Following example is provided for your convenience which can be adapted to your use case. It uses a custom visual as the background of network plot:
flows<-data.frame(from=c("A","B","A"), to=c("B","A","C"), weight=c(10,20,5))
nodes<-data.frame(id=c("A","B","C","D"),x=c(0,4,0,4),y=c(3,0,0,3))
g <- SDI(flows,nodes, variant="vuw")
library(ggplot2)
library(ggraph)
library(ggimage)
url<-"https://static.wikia.nocookie.net/lotr/images/5/59/Middle-earth.jpg/revision/latest?cb=20060726004750"
lay <- create_layout(g, 'manual', x=V(g)$x, y=V(g)$y)
p<-ggraph(lay) +
geom_edge_bend(aes(label=E(g)$weight), label_size=10,strength=0.4,edge_width=3,alpha=0.3,arrow = arrow(length = unit(10, 'mm')))+
#geom_node_point(size = 10, aes(color="yellow"),alpha=0.4) +
geom_node_point(aes(size=V(g)$SDI_vuw),color="red")+
geom_node_text(label=V(g)$name, size=10, vjust=-0.7,hjust=1)+
xlim(-3,5)+ylim(-2,4)
p<-ggbackground(p, url)
p
please note that when run over the whole network, directionality makes no difference, so we omit the \(\textrm{SDI}_{\pm}(\ldots)\) notation in this one↩︎