tsna : Tools for Temporal Social Network Analysis

package version 0.3.5 built 2021-10-31

Skye Bender deMoll (skyebend@uw.edu) and the statnet team (https://statnet.org). Major contributions from James Moody, Martina Morris, Carter Butts, Steve Goodreau, Samuel Jenness, Li Wang and Kirk Li. This work was supported by grant R01HD68395 from the National Institute of Health.

Documentation compiled Sun Oct 31 08:32:29 2021


Overview

This package provides tools for working with dynamic networks (or processes on networks) in which the ordering and timing of ties is important. These approaches are needed when the phenomena of interest involve networks having densities and edge-turnover rates in a parameter space such that the time-collapsed aggregate network give too great a distortion of the toplology connectivity for useful analysis.

Most tsna functions accept as their input continuous- and discrete-time longitudinal networks having vertex, edge, and attribute dynamics stored in the networkDynamic format. tsna includes tools for applying "traditional" static Social Network Analysis (SNA) metrics at multiple time points, as well as temporal extensions of SNA metrics using forwards- and backwards-path routines. The initial version of tsna is primarily focused on working with networks of the type that might be generated from discrete time simulations of network evolution.

Research on formal methods for analysis of longitudinal networks is a relatively new area of study. Many of the measures included in this package may be formally identical to measures already published with different names in literature of diverse research fields. We will attempt to find and cite such previous work, but if you encounter any omissions, please let us know.

Requirements

This document assumes familiarity with general concepts of SNA, R, and the statnet suite of R packages. For more background on the network and networkDynamic data structures and functions, please see appropriate package documentation and tutorials.

library(tsna)
Loading required package: network

'network' 1.17.1 (2021-06-12), part of the Statnet Project
* 'news(package="network")' for changes since last version
* 'citation("network")' for citation information
* 'https://statnet.org' for help, support, and other information
Loading required package: networkDynamic

'networkDynamic' 0.11.0 (2021-06-12), part of the Statnet Project
* 'news(package="networkDynamic")' for changes since last version
* 'citation("networkDynamic")' for citation information
* 'https://statnet.org' for help, support, and other information

As always in R, a general help file for the package can be displayed, and the individual arguments for each function are documented in more detail in the function's help page.

?tsna
?tPath

This vignette also makes use of various example data sets provided by the networkDynamicData package, and optionally employs static Social Network Analysis measures provided by the sna package.

library(networkDynamicData)
library(sna)
Loading required package: statnet.common

Attaching package: 'statnet.common'
The following objects are masked from 'package:base':

    attr, order
sna: Tools for Social Network Analysis
Version 2.6 created on 2020-10-5.
copyright (c) 2005, Carter T. Butts, University of California-Irvine
 For citation information, type citation("sna").
 Type help(package="sna") to get started.

Introduction

Most of the tsna package function assume that their input is formatted as a networkDynamic data structure. The networkDynamic package provides utilities (networkDynamic()) for converting data from various formats (such as timed edge-lists, or lists of matrices) as well as functions for manipulating the data structures.

The data structure provided by networkDynamic objects assumes that the vertices and (directed or non-directed) edges of a network have multiple 'activity spells' associated with them indicating when they are 'active' or exist within the observation period. Each spell is an interval with an onset and terminus time. Each edge or vertex can activate and deactivate multiple times during the period over which the network is observed. Please see ?activate for additional details.

As an example, consider the included moodyContactSim example object (Moody 2008). We will first plot it as a time-aggregated network, ignoring the temporal info and showing all of the edges that are ever active.

data(moodyContactSim)
plot(moodyContactSim,displaylabels = TRUE,main='aggregate network')

We can view the activity spells associated with the network's edges

as.data.frame(moodyContactSim)
   onset terminus tail head onset.censored terminus.censored duration edge.id
1     40       72   10    4          FALSE             FALSE       32       1
2    214      247    1   11          FALSE             FALSE       33       2
3    224      256    7   10          FALSE             FALSE       32       3
4    453      479   13    4          FALSE             FALSE       26       4
5    494      524   13    2          FALSE             FALSE       30       5
6    575      599    2   16          FALSE             FALSE       24       6
7    583      615    1   16          FALSE             FALSE       32       7
8    621      651    1   12          FALSE             FALSE       30       8
9    634      660   13    3          FALSE             FALSE       26       9
10   665      692    4   14          FALSE             FALSE       27      10
11   674      701    1    9          FALSE             FALSE       27      11
12   701      733   16    6          FALSE             FALSE       32      12
13   709      740    2   15          FALSE             FALSE       31      13
14   712      739   13    5          FALSE             FALSE       27      14
15   719      745    8   13          FALSE             FALSE       26      15
16   748      782    4   16          FALSE             FALSE       34      16
17   749      793   11    8          FALSE             FALSE       44      17
18   769      795   13    7          FALSE             FALSE       26      18

Since this is a fairly simple network, we could plot the times on the aggregate network diagram as edge labels.

coords<-plot(moodyContactSim,
     displaylabels=TRUE,
     label.cex=0.8,
     label.pos=5,
     vertex.col='white',
     vertex.cex=3,
     edge.label=sapply(get.edge.activity(moodyContactSim),function(e){
       paste('(',e[,1],'-',e[,2],')',sep='')
     }),
     edge.label.col='blue',
     edge.label.cex=0.7
   )

In situations when we are considering transmission processes on a dynamic network, the aggregate network is not a good representation because it exaggerates the connectivity in the network by ignoring the importance of the order of edge events (Moody, 2002). For example, the apparent 8 to 11 to 1 path appears to be the shortest path traverseable in the static aggregate network. But when we consider the constraints of edge activity, the edge between 1 and 11 becomes inactive long before the edge between 8 and 11 activates, there is no such path in the dynamic network.

We could go to the other extreme, and instead of looking at the aggregate network we view only the network that is actually active at a single point in time. These networks will be much less connected than time-collapsed network. We see the edge between 1 and 11, and the edge between 11 and 8, but is difficult to determine any connectivity at all for the network as a whole.

par(mfcol=c(1,2))
plot(network.extract(moodyContactSim,at=215),
     main='network at time 215',
     displaylabels=TRUE,
     label.cex=0.6,
     label.pos=5,
     vertex.col='white',
     vertex.cex=3,
     coord=coords)
plot(network.extract(moodyContactSim,at=750),
     main='network at time 750',
     displaylabels=TRUE,
     label.cex=0.6,
     label.pos=5,
     vertex.col='white',
     vertex.cex=3,
     coord=coords)

par(mfcol=c(1,1))

A goal of the tsna package is to allow users to flexibly explore network data and apply metrics at multiple temporal resolutions. We also provide some techniques for extracting networks of potential temporal paths through the network as a means of examining transmission potential without introducing aggregation biases.

Temporal Paths and metrics

Explanation of paths in network

One of the key features of this package is the ability to calculate temporal paths in networks. A forward temporal path is a sequence of vertices and directed edges such that the start times of successive elements are greater than or equal than those of the previous. The path is a directed traversal of the network that respects the constraints of edge activity spells and permits 'waiting' at intermediate vertices for 'future' edges to form.

To give a concrete example, if a network describes the time-evolution of contacts between people during which disease transmission could occur, a sequence of infection events spreading out through network from a single person would be a temporal path. Likewise, a series of messages forwarding a cute cat picture would be a temporal path in an email network. In most cases we assume that once a path (message) reaches a vertex, that vertex remains 'infected' and can transmit to all of its future contacts.

Depending on network topology, there may be lots of possible temporal paths between a single pair of vertices. Bui Xuan, et al (2002) call a permissable time-respecting path between a pair of vertices a journey. The earliest arriving forward temporal path from vertex i to vertex j can be thought of as the soonest possible time a message sent from i to j could possibly arrive. Because it depends on the sequence of edge timing, earliest forward path is not usually equal to the shortest path geodesic in a static network and may not be the same as the shortest temporal path. However, we can calculate the earliest path with relative efficiency using algorithms similar to those used to calculate the shortest geodesic path (Bui Xuan, et al. 2002). Please see the package documentation for for further algorithm details.

The forward reachable set (FRS) from vertex v at time t is the set of vertices reachable by forward temporal paths from v, begining at time t. Also known as temporal out-component (Nicosia, et al 2013)

Highlite possible paths found on Moody's example.

For a more tangible example, we can use tPath to extract a forward temporal path starting from vertex 10 in the Moody example network.

v10path<-tPath(moodyContactSim,v=10,start=0)
print(v10path)
$tdist
 [1] 583 494 634  40 712 701 224 719 674   0 749 621 453 665 709 575

$previous
 [1] 16 13 13 10 13 16 10 13  1  0  8  1  4  4  2  2

$gsteps
 [1] 5 3 3 1 3 5 1 3 6 0 4 6 2 2 4 4

$start
[1] 0

$end
[1] Inf

$direction
[1] "fwd"

$type
[1] "earliest.arrive"

attr(,"class")
[1] "tPath" "list" 

The tPath function returns a tPath object which is a list with several components. The $tdist component indicates the time (elapsed from start) at which each vertex is reached from the starting vertex v. The $previous component gives, for each vertex index, the vertex id of the previous vertex along path. As we will see, this path information can be used to construct a new network object containing only the tPath tree. $gsteps gives the number of steps (graph hops) in the path from v for each vertex. The remaining elements store the query parameters for later use by other functions.

There is a special plot function for tPath objects, which essentially sets some useful presets for plot.network.

plot(v10path,coord=coords, displaylabels=TRUE)

The tsna package also provides a utility function to plot the path on top of the static aggregate network, along with the distance (transmission time) for each edge. We can use it to visually compare two paths.

Extract an alternate path, this time starting from vertex 1 instead of 10

v1path<-tPath(moodyContactSim,v=1,start=0)

Plot both paths side-by-side.

par(mfcol=c(1,2)) # set up side-by-side plot

plotPaths(moodyContactSim,v10path,coord=coords, main='fwd path from v10')
plotPaths(moodyContactSim,v1path,coord=coords, main = 'fwd path from v1')

par(mfcol=c(1,1)) # turn off side-by-side plots

More fun facts about paths

Notice that paths are directed (even if the underlying network is not) and not always symmetric. For example vertex 10 can reach 1, but 1 cannot reach 10.

It is also possible to pass in a list of multiple paths and draw them on the same network.

# or draw both plots on the  them both on the same network
plotPaths(moodyContactSim,coord=coords,list(v10path,v1path))

The forward reachable set is not the same as the backwards reachable set (BRS) : The set of vertices that v can reach along paths traveling backwards in time. Or perhaps more intuitively, the set of vertices that can reach v along forward paths.

plotPaths(moodyContactSim, list(
          tPath(moodyContactSim,v=10,direction='bkwd',type='latest.depart'),
          tPath(moodyContactSim,v=10)))

In most of the examples, we've been assuming that the the start of the tPath search begins at the earliest time observed in the network. But that doesn't have to be true, a tPath can begin at any time on the network, and the size of the reachable set will often depend on the time at which the search began.

par(mfcol=1:2)
plot(tPath(moodyContactSim,v=1,start=0),coord=coords,
     main='tPath from v1 @ t=0')
plot(tPath(moodyContactSim,v=1,start=500),coord=coords,
     main='tPath from v1 @ t=500')

par(mfcol=c(1,1))

In the plots above, both tPaths started at vertex 1, but the one that started later (at t=500) doesn't include vertices 11 and 8 because the 1-11 edge had already ended before the path search begain.

Other types of temporal paths

Here is a trivial example that illustrates several types of potential 2-step temporal paths from vertex A to vertex G.

First we construct a directed network with edge timing very specifically constructed to make only certain paths feasible.

pathCompare<-network.initialize(7)
network.vertex.names(pathCompare)<-LETTERS[1:7]
add.edges.active(pathCompare,tail=c(1,2),head=c(2,7),onset=c(1,4),terminus=c(2,5))
add.edges.active(pathCompare,tail=c(1,3),head=c(3,7),onset=c(0,6),terminus=c(2,7))
add.edges.active(pathCompare,tail=c(1,4),head=c(4,7),onset=c(4,5),terminus=c(5,6))
add.edges.active(pathCompare,tail=c(1,5),head=c(5,7),onset=c(6,9),terminus=c(7,10))
add.edges.active(pathCompare,tail=c(1,6),head=c(6,7),onset=c(4,10),terminus=c(5,11))
as.data.frame(pathCompare)
   onset terminus tail head onset.censored terminus.censored duration edge.id
1      1        2    1    2          FALSE             FALSE        1       1
2      4        5    2    7          FALSE             FALSE        1       2
3      0        2    1    3          FALSE             FALSE        2       3
4      6        7    3    7          FALSE             FALSE        1       4
5      4        5    1    4          FALSE             FALSE        1       5
6      5        6    4    7          FALSE             FALSE        1       6
7      6        7    1    5          FALSE             FALSE        1       7
8      9       10    5    7          FALSE             FALSE        1       8
9      4        5    1    6          FALSE             FALSE        1       9
10    10       11    6    7          FALSE             FALSE        1      10

Now plot it, with each edge labeled with onset and terminus times, and each path type in a different color.

# pre-define some coords for arbitrary positioning
coords<-cbind(c(0,0.5,0.5,0.5,0.5,0.5,1),c(0.3,0.15,0.3,0.45,0.65,0.8,0.7))
# do the plot
plot(pathCompare,
     coord=coords,jitter=FALSE,
     #mode='circle',
     displaylabels=TRUE, vertex.col='white',
     edge.label=get.edge.activity(pathCompare),edge.label.cex=0.8,
     edge.lwd=4,
     edge.col=c('blue','blue','red','red','green','green','orange','orange','pink','pink'),
    main='Comparison of fwd temporal path types from A to G')
# plot a legend
legend(-0.3,1,legend = c('earliest leaving (ACG @ 6)',
                         'earliest arriving (ABG @ 4)',
                         'latest leaving (AEG @ 10)',
                         'quickest (ADG @ 5)',
                         'latest arriving (AFG @ 11)'),
               fill=c('red','blue','orange','green','pink'),
       cex=0.8)

Most of these other path types would be useful for specific applications, but we do not yet have algorithms implemented for effiently calculating them on non-trivial networks.

Again, it is important to rember that the earliest temporal path is not always the same as the shortest (fewest step) temporal path.

Compare earliest forward set sizes

For transmission processes, the forward reachable set is a theoretical upper bound (for a perfectly infectious process) on the possible number of vertices that can be reached from a source vertex within the time bounds. Hence, if we calculate the forward reachable set from every vertex, we would know the distribution of maximum possible epidemic sizes in the dynamic network. Can we characterize networks by looking at the distribution of sizes of fwd reachable sets?

This example makes use of the ConcurrencyComparisonNets example data provided in the networkDynamicData package.

library(networkDynamicData)
data(concurrencyComparisonNets)

When loaded, this brings in three discrete time networks, simulated via a tergm process, each of which is 100 time steps in duration. All three networks were parameterized to have the same size, relationship duration distribution and cross-sectional mean degree, but different cross-sectional degree distributions.

The tReach function computes the sizes of forward reachable sets. Since these are large-ish networks (1000 vertices), we will only calculate the paths from a sample of 25 seed vertices to save time.

baseTrees<-tReach(base,sample=25)
monogTrees<-tReach(monog,sample=25)
middleTrees<-tReach(middle,sample=25)

If we peek at the forward reachable set sizes we observe in the base network (the network with a Poisson cross-sectional degree distribution) we see a large number of FRS reaching the majority of the network, and a few cases the didn't spread far at all.

baseTrees
 [1]   6 750 482 811 799 632 794 815 707 719 704 756 599 798 703 131 293  13 540
[20] 334 726   2 776 776 753

Contrast this with the monog network has Bernoulli (0,1) cross-section degree distribution, meaning nobody has more than one partner at a time. This greatly reduces the spreading potential of the network, so almost all of the FRS are quite small.

monogTrees
 [1] 54 57 35 11 14  9 20 12  6 41  8 30 60  8 50 11 21 35 63 19 45 13 11 37 31

We can compare the three distributions using box plots.

boxplot(cbind(baseTrees,monogTrees,middleTrees),
        main='fwd-reachable set size distributions for nets of varying concurrency')

For a bit more detail on the distribution of FRS sizes, we can plot some overlapping histograms.

hist(baseTrees, main='fwd-reach size distributions',
     ylim=c(0,50),xlim=c(0,1000),
     breaks=seq(from=0,to=1000,by=50),
     col='#55000033',xlab='reachable set size')
hist(monogTrees,ylim=c(0,50),xlim=c(0,1000),
     breaks=seq(from=0,to=1000,by=50),
     col='#00550033',add=TRUE)
hist(middleTrees,ylim=c(0,50),xlim=c(0,1000),
     breaks=seq(from=0,to=1000,by=50),
     col='#00005533',add=TRUE)
legend(800,50,legend=c('base','monog','middle'),
       fill=c('#55000033','#00550033','#00005533'))

These networks would clearly have very different spreading potential, despite having nearly identical aggregate mean degrees and densities -- properties which would often be used to predict the size of expected 'giant component' in a static network.

mean(degree(as.network(base)))
[1] 7.664
mean(degree(as.network(monog)))
[1] 7.688
mean(degree(as.network(middle)))
[1] 7.68

Static projections of temporal networks

Dynamic networks -- especially those that are constructed from multiple waves of static networks -- can be be converted into a time-projected "multi-slice" static network representation which preserves the potential paths the network. This type of network is conceptually similar to the representation used internally for the forward path calculations and be very helpful for testing and understanding flow through temporal networks.

Create four random 'slice' networks and plot them

nets4<-replicate(4,list(network(matrix(rbinom(16,5,0.1),ncol=4,nrow=4))))
par(mfcol=c(2,2))
plot(nets4[[1]],displaylabels=TRUE,main='t0')
plot(nets4[[2]],displaylabels=TRUE,main='t1')
plot(nets4[[3]],displaylabels=TRUE,main='t2')
plot(nets4[[4]],displaylabels=TRUE,main='t3')

par(mfcol=c(1,1))

Aggregate them into a networkDynamic object, and project it in time by creating "identity arcs" bentween the vertices' realizations in sucessive time slices.

nets4Dyn<-networkDynamic(network.list=nets4)
Neither start or onsets specified, assuming start=0
Onsets and termini not specified, assuming each network in network.list should have a discrete spell of length 1
Argument base.net not specified, using first element of network.list instead
Created net.obs.period to describe network
 Network observation period info:
  Number of observation spells: 1 
  Maximal time range observed: 0 until 4 
  Temporal mode: discrete 
  Time unit: step 
  Suggested time increment: 1 
nets4Projected<-timeProjectedNetwork(nets4Dyn)

The projected network will have size equal the the number of vertices in the original network times the number of time slices, with each vertex now appearing multiple times.

network.size(nets4Projected)
[1] 16
network.vertex.names(nets4Projected)
 [1] 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4

The new network will have an edge attribute named edge.type added to distinguish the original "within_slice" ties from the between-slice identity arcs. We can use this control edge color to make the plot a little clearer.

plot(nets4Projected,
     displaylabels=TRUE,
     mode='kamadakawai',
     edge.col=ifelse(nets4Projected%e%'edge.type'=='identity_arc','gray','black'))

For a temporally sparse network like moodyContactSim this can start to get a bit croweded on the plot.

changes<-get.change.times(moodyContactSim)
moodyProj<-timeProjectedNetwork(moodyContactSim,onsets=changes,termini=changes)
plot(moodyProj,
     mode='kamadakawai',
     vertex.cex=0.3,
     arrowhead.cex=0.1,
     edge.col=ifelse(moodyProj%e%'edge.type'=='identity_arc','gray','black'))

Note that we computed a network slice at every time point that the network changed. The produced quite a lote of "extra" vertices, but does make it so that we are sure that the temporal connectivity of the original dynamic network is completly represented in the projected one.

Rates of Change

The functions tEdgeFormation and tEdgeDissolution evaluate a network object at multiple time points and return counts of the number of edges forming (edge onset at time point) and dissolving (edge terminus at time point). These functions are intended to provide descriptive stats about momentary rates of change in the network.

plot(tEdgeDissolution(base),main="Edge dissolution counts for network 'base'")

plot(tEdgeFormation(base), main="Edge formation counts for network 'base'")

Both measures also include the ability to output the results as fractions instead of a raw counts. For edge dissolution, it then returns the the fraction of previously-active edges that dissolve at the specified time point. We can also use the start, end and time.interval parameters to adjust how frequently the evaluations are made.

tEdgeDissolution(base,result.type = 'fraction',time.interval = 10)
Time Series:
Start = 0 
End = 100 
Frequency = 0.1 
 [1] 0.00000000 0.04314721 0.01891892 0.04810127 0.05329949 0.06122449
 [7] 0.04282116 0.01955990 0.04018913 0.02689487 0.05822785

For formation, the fraction is defined as the number of edges that form divided by the number of possible empty dyads (pairs of vertices that do not have an edge between them) that could possibly form ties.

tEdgeFormation(base,result.type = 'fraction',time.interval = 10)
Time Series:
Start = 0 
End = 100 
Frequency = 0.1 
 [1] 0.000000e+00 2.203817e-05 2.804762e-05 3.606188e-05 3.405831e-05
 [6] 2.604454e-05 2.804858e-05 3.806715e-05 3.205706e-05 1.602853e-05
[11] 2.804813e-05

For sparse networks, these numbers will tend to be very, very small.

Static graph metrics as time series

Many authors have described useful generalizations of traditional network statistics by simply applying static graph metrics at multiple time points in order to characterize changes in the network over time. The tsna package provides utilities to draw on the static metrics offered by the sna (Butts 2014) and ergm (Handcock et. al. 2015) packages.

Using sna package metrics

data(harry_potter_support)

Compute graph transitivity for all time points

tSnaStats(harry_potter_support,snafun='gtrans')
Time Series:
Start = 0 
End = 6 
Frequency = 1 
      Series 1
[1,] 0.8591549
[2,] 0.6374046
[3,] 0.6666667
[4,] 0.6016949
[5,] 0.6141176
[6,] 0.4492754
[7,]        NA

The results are returned as a time series object, which is a special type of matrix in which the rows correspond to regularly spaced time increments and the columns are variables of interest.

Other measures, such as triad.census, return multiple columns, one for each statistic (counts of the the various triad types)

# compute triad census scores for each time point
tSnaStats(harry_potter_support,snafun='triad.census')
Time Series:
Start = 0 
End = 6 
Frequency = 1 
    003 012  102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U 120C 210 300
0 40488 222  911    0    2    0   20    0    0    0   0    0    1    0   0  20
1 38582 942 1901    1   32    6   66    8    0    0  37    6   18    4  28  33
2 38726 879 1850    0   20    2   54   20    1    0  36    3   15    1  16  41
3 40378 556  668    0    5    5   21   11    0    0   4    1    4    1   0  10
4 37581 594 3145    0   22    8  106   16    0    0  93    2    7    0  12  78
5 39638 939  969    2    2   10   23   33    3    0  22    0    9    0   4  10
6    NA  NA   NA   NA   NA   NA   NA   NA   NA   NA  NA   NA   NA   NA  NA  NA

For vertex-level measures such as betweenness we will get back one column for each vertex indicating how its betweenness score changes over time. This doesn't fit well on the page, so we will not print it all out.

# compute degrees
bet<-tSnaStats(harry_potter_support,snafun='betweenness')
nrow(bet)
[1] 7
ncol(bet)
[1] 64
bet[,25,drop=FALSE]
Time Series:
Start = 0 
End = 6 
Frequency = 1 
     Harry James Potter
[1,]           15.00000
[2,]          127.75000
[3,]           50.66190
[4,]           42.16667
[5,]          252.00000
[6,]           50.08333
[7,]                 NA
class(bet)
[1] "mts"    "ts"     "matrix"

Since a time series (ts) object really is just a fancy matrix, we can still use matrix functions on the rows and columns.

Compute the mean (over time) betweenness of each vertex

colMeans(bet,na.rm = TRUE)
          Adrian Pucey         Alicia Spinnet       Angelina Johnson 
            0.00000000             1.03214286             0.78214286 
     Anthony Goldstein          Blaise Zabini          C. Warrington 
            0.00000000             0.00000000             0.00000000 
        Cedric Diggory              Cho Chang          Colin Creevey 
            1.33333333             0.00000000             0.00000000 
       Cormac McLaggen            Dean Thomas         Demelza Robins 
            0.00000000             6.79047619             0.00000000 
        Dennis Creevey           Draco Malfoy       Eddie Carmichael 
            0.00000000             0.00000000             0.00000000 
     Eleanor Branstone        Ernie Macmillan       Euan Abercrombie 
            0.00000000             5.66666667             0.00000000 
          Fred Weasley         George Weasley          Ginny Weasley 
            6.05714286             4.47380952             0.93055556 
      Graham Pritchard          Gregory Goyle          Hannah Abbott 
            0.00000000             0.00000000             0.00000000 
    Harry James Potter       Hermione Granger           Jimmy Peakes 
           89.61031746             8.77222222             0.00000000 
Justin Finch-Fletchley             Katie Bell           Kevin Whitby 
            5.66666667             0.00000000             0.00000000 
        Lavender Brown                 Leanne             Lee Jordan 
            0.00000000             0.00000000             0.66666667 
           Lucian Bole          Luna Lovegood        Malcolm Baddock 
            0.00000000             0.04166667             0.00000000 
    Mandy Brocklehurst           Marcus Belby           Marcus Flint 
            0.00000000             0.00000000             0.00000000 
        Michael Corner        Miles Bletchley    Millicent Bulstrode 
            0.00000000             0.00000000             0.00000000 
      Natalie McDonald     Neville Longbottom            Oliver Wood 
            0.00000000             1.70000000             0.00000000 
           Orla Quirke         Owen Cauldwell            Padma Patil 
            0.00000000             0.00000000             0.00000000 
       Pansy Parkinson          Parvati Patil    Penelope Clearwater 
            0.33333333             0.00000000             0.00000000 
         Percy Weasley      Peregrine Derrick           Roger Davies 
            0.00000000             0.00000000             0.00000000 
          Romilda Vane         Ronald Weasley            Rose Zeller 
            0.00000000            22.58015873             0.00000000 
       Seamus Finnigan       Stewart Ackerley            Susan Bones 
            2.64603175             0.00000000             0.00000000 
            Terry Boot          Theodore Nott         Vincent Crabbe 
            0.41666667             0.00000000             0.00000000 
       Zacharias Smith 
            0.00000000 

Or look at it the other way and compute mean betweenness of all vertices at each timepoint

rowMeans(bet)
[1] 0.312500 3.828125 2.546875 0.984375 5.781250 1.500000       NA

Not surprisingly, HP has the highest betweeness.

Since the sna metrics are usually not cheap to calculate, we may not want to evaluate them at every single time step for larger longer-duration networks. If we just need a sense of the trend in the network, we can use the time.interval parameter to sample.

prestScores<-tSnaStats(base,'prestige',time.interval=25,rescale=TRUE)

And notice that we also passed in the rescale=TRUE argument used by the prestige function. If the effective size of the network is changing due to vertex activity, this would allow the sna metric to renormalize with network size to reduce possible variations due to making comparisons between networks of varying size.

Using ergm terms as static metrics

Each of the model terms provided by the ergm package (and its various add-ons) provide a 'change statistic' for evaluating the effect of changing a single tie on network structure. These terms can be effectively used as descriptive statistic for static networks. For the static case, this is done by passing a formula naming the desired statistics to ergm's summary function.

The tsna package provides a wrapper to evaluate the statistics at multiple time points and return the result as a time series (ts) object with a value for each statistic at each time point.

 tErgmStats(base,'~edges+concurrent',
               start=0,end=100,time.interval = 10)
Time Series:
Start = 0 
End = 100 
Frequency = 0.1 
    edges concurrent
  0   371        186
 10   377        186
 20   363        154
 30   376        162
 40   373        166
 50   368        169
 60   380        181
 70   401        203
 80   406        191
 90   398        189
100   372        166

The ts object has a really handy print function that can display the time series for each statistic side-by-side or overlapped on a single plot.

 plot(
   tErgmStats(base,'~edges+concurrent',
                start=0,end=100,time.interval = 10)
    )

Note that just like for the sna functions, some ergm terms (such as degree) produce multiple statistics in the output. Also, there is some overlap with functionality appearing in the sna package i.e. (sna's degree vs ergm sociality).

Durations and densities

The tsna package currently contains several functions for reporting on the durations of the activity events (spells) associated with a network. Most of these functions have options for varying the units of aggregation or analysis. For example, the edge measures can be adjusted from focusing on an 'edge' (aggregating across all of the activity spells associated with that edge) or a 'spell' (a single interval of contiguous time during which an edge is activated).

Distributions of edge durations

The edgeDuration function returns the activity durations associated with the edges of the network.

edgeDuration(moodyContactSim)
 [1] 32 33 32 26 30 24 32 30 26 27 27 32 31 27 26 34 44 26

This output means that the first edge was active for 32 time units, the second for 33, etc. This useful when we want to find out what the distribution of edge durations is shaped like.

summary(edgeDuration(moodyContactSim))
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  24.00   26.25   30.00   29.94   32.00   44.00 
hist(edgeDuration(moodyContactSim))

Durations for empty dyads (non existing edges) are not reported.

For a more realistic example, we can compare the distributions of several networks simulated using the tergm package. (Please see ?concurrencyComparisonNets for more background on these networks) We would expect that since the concurrency simulations all had the same dissolution parameter in the model, which was simply a uniform probability that an edge would be deactivated at each time step, they should all have similarly shaped distributions of edge durations.

data(concurrencyComparisonNets)
hist(edgeDuration(base),ylim=c(0,800))

hist(edgeDuration(middle),ylim=c(0,800))

hist(edgeDuration(monog),ylim=c(0,800))

Re-occuring edges

The durations functions also have the ability to count the number of spells instead of aggregating their durations. This is mostly useful for working with event-based continuous data (see below) but also makes it so we can quickly determine if the network has re-occurring ties. In other words, which edges have more than one activity spell associated with them?

which(edgeDuration(monog,mode='counts')>1)
[1] 105 366 888
which(edgeDuration(moodyContactSim,mode='counts')>1)
integer(0)

All of the edges in the moodyContactSim network are only active once (making many times of calculations much simpler), where in the monog network, there are a few edges that toggle on and off and then back on again. By default, the edgeDuration aggregates at the level of subject='edge' summing the durations together. Setting subject='spells' will return / count each activity spell independently. For example, we just query the duration of the 105th valid edge id which was reported as having multiple spells.

edgeDuration(monog,e=valid.eids(monog)[105],subject='edges')
[1] 31
edgeDuration(monog,e=valid.eids(monog)[105],subject='spells')
[1] 13 18

For networks in which edges rarely re-activate, there may be little difference. But we were trying to determine something like mean values of activity durations, we'd probably want to use subject='spells' to count each edges' spells separately.

mean(edgeDuration(base,subject = 'edges'))
[1] 20.15911
mean(edgeDuration(base,subject = 'spells'))
[1] 20.1485

Finding vertex activity durations

For networks with a changing set of active vertices, the durations can be summarized using the vertexDuration function. The windsurfers data set only records interactions between the surfers observed on the specific beach. Since most of the surfers don't make it to beach every day and can't be observed, they are considered inactive.

data(windsurfers)
vertexDuration(windsurfers)
 [1] 27 13 19 22  5  8  5  2  1  4  5  9  7  8  7 18  9  4  2  1  5  3  6  6  4
[26]  6  7  6 14  5  4  1  1  2  4  1  8  3 14  8  8  3  3  7  3  3  1  8  2  3
[51]  6  4  3  1  2  3  2  4  7  7  2  3  3  2  5  6  5  3  6  7  2  1  1 14  2
[76]  2  2  5  3  2  2  2  2  4  3  1  3  2  3  2  1  2  1  1  1
table(vertexDuration(windsurfers))

 1  2  3  4  5  6  7  8  9 13 14 18 19 22 27 
14 19 16  8  8  7  7  6  2  1  3  1  1  1  1 
hist(vertexDuration(windsurfers))

hist(vertexDuration(windsurfers,subject='spells'))

Again, setting subject='spells' changes the interpretation from "give me the total number of days each vertex was observed to be active in the dataset" to "give me the number of consecutive days in each spell of days the vertices were observed to be active"

Finding connected times of vertices

The tiedDuration function measures the total amount of time each vertex has ties during the network observation period.

Although it is possible to represent networks in which edges link to inactive vertices, we assume that the situations are rare enough that we have not implemented a specific measure to describe it. (For data cleaning purposes, such cases can be discovered using network.dynamic.check.)

The McFarland classroom interaction dataset is a collection of time-stamped speech acts among teachers and students during 40 minutes of classroom time. How much total talking does each person in the room do?

data(McFarland_cls33_10_16_96)
tiedDuration(cls33_10_16_96, mode='counts')
 [1]  29   0  27  63  25  23 164  19  18  32  19  39  20  81  20  26  32  29   0
[20]  25

Notice that since the edges in this network are coded has events with no duration, we actually asked it to consider counts instead of durations. So there is one person (7) who talks a lot more, and two people who don't talk at all (2 and 19). Could this be related to their roles?

cls33_10_16_96%v%'type'
 [1] "grade_11"   "grade_11"   "grade_11"   "grade_12"   "grade_12"  
 [6] "grade_11"   "instructor" "grade_11"   "grade_12"   "grade_11"  
[11] "grade_12"   "grade_11"   "grade_12"   "instructor" "grade_11"  
[16] "grade_11"   "grade_11"   "grade_11"   "grade_11"   "grade_11"  

Well, at least the talker is one of the teachers. Since this is a directed network, it might also be interesting to look at it in terms of who people talk to

tiedDuration(cls33_10_16_96, mode='counts',neighborhood = 'in')
 [1] 38 10 37 55 37 32 32 29 28 47 29 49 30 46 30 36 42 40 10 34

Or perhaps a ratio of speaking vs. being spoken to.

plot(tiedDuration(cls33_10_16_96, mode='counts',neighborhood = 'out'),
     tiedDuration(cls33_10_16_96, mode='counts',neighborhood = 'in'),
     xlab='# speaking events',ylab='# spoken to events',main='McFarland classroom network, speaking vs. spoken to' )
text(tiedDuration(cls33_10_16_96, mode='counts',neighborhood = 'out'),
     tiedDuration(cls33_10_16_96, mode='counts',neighborhood = 'in'),
     label=cls33_10_16_96%v%'type',cex=0.8,pos=4)

If we turn to the concurrency comparison networks, it is interesting to compare the distributions of connected duration for the three scenarios. We might assume that vertices that are connected longer are more likely to be reached (or infected) because, ignoring topology, they simply have more opertunities to transmit or recieve. Since this is an undirected network, tiedDuration totals the amount of time that each vertex appears as the tail or head of each edge.

plot(sort(tiedDuration(base)),type='l',ylim=c(0,400),
     main='sorted tiedDuration for concurrency scenearios',
     xlab='sorted vertices',ylab='duration that each vertex is connected', col ='#55000033',lwd=4)
points(sort(tiedDuration(monog)),type='l',col='#00550033',lwd=4)
points(sort(tiedDuration(middle)),type='l',col='#00005533',lwd=4)
legend(200,300,legend=c('base','monog','middle'),
       fill=c('#55000033','#00550033','#00005533'))

mean(tiedDuration(base))
[1] 76.524
mean(tiedDuration(monog))
[1] 76.102
mean(tiedDuration(middle))
[1] 73.804

The means for the networks are quite similar, as we would expect since they were simulated to have the same momentary mean degree, but the shape of the distributions are somewhat different. There is very little variation in the monog version, but the middle and base networks each have some vertices have much higher connection rates.

Difference between degree and tiedDuration

Like many of the temporal measures, for truly random graphs we might expect a strong correlation between the number of people contacted (degree) and the total contact time (tiedDuration). But of course there is always the possibility of someone having a very few long-duration ties, or very many short ties.

Indeed, plotting aggregate degree against connected duration in the simulated networks suggests they are pretty tightly correlated.

par(mfcol=c(1,3))
plot(degree(as.network(base)),tiedDuration(base),xlim=c(0,25),ylim=c(0,350),main='base')
plot(degree(as.network(middle)),tiedDuration(middle),xlim=c(0,25), ylim=c(0,350),main='middle')
plot(degree(as.network(monog)),tiedDuration(monog),xlim=c(0,25),ylim=c(0,350),main='monog')

par(mfcol=c(1,1))

We can compare the measures on observations for each person in the real-world observations of the hospital contact network.

data(hospital_contact)
plot(degree(as.network(hospital),gmode = 'graph'),tiedDuration(hospital),
     xlab='aggregate degree (total number of unique contacts)',
     ylab='total contact duration (seconds)',
     main='Vertices in hospital RFID proximity contact network')

Compare duration measures on various example networks

Lets put a bunch of the example networks together on a list so we apply the metrics to all of them at once

data(moodyContactSim)
data(harry_potter_support)
data(onlineNetwork)
data(vanDeBunt_students)
data(McFarland_cls33_10_16_96)
data(windsurfers)
data(hospital_contact)
data(concurrencyComparisonNets)
nets<-list(
  moodyContactSim=moodyContactSim,
  hospital=hospital,
  base=base,
  monog=monog,
  harry_potter=harry_potter_support,
  onlineNet=onlineNet,
  vanDeBunt=vanDeBunt_students,
  McFarland=cls33_10_16_96,
  windsurfers=windsurfers)

Plot histograms for the tie durations for 9 example networks

par(mfcol=c(3,3))
for (n in seq_along(nets)){
  hist(edgeDuration(nets[[n]]),main=names(nets)[n],xlab='duration')
}

par(mfcol=c(1,1))

Notice that onlineNet, vanDeBunt and McFarland's classroom all have momentary events, so durations are not a very useful metric. Hospital has very short durations (20 seconds) compared to the overall time (~350,000 seconds). Really should be corrected to measure as if it was discrete time with 20 sec time steps. And since we are not considering the actual units of time, doesn't really make sense to put them on the same plot

If we look at it by event counts per edge instead of durations, the momentary event networks rank much higher.

par(mfcol=c(3,3))
for (n in seq_along(nets)){
  hist(edgeDuration(nets[[n]],mode = 'counts'),main=names(nets)[n],xlab='duration')
}

par(mfcol=c(1,1))

Density of activity

The tEdgeDensity function is a network-level metric to compute the temporal density of a networkDynamic object. It can be thought of as a crude measure of the total relational activity in the network. It is expressed as a measure of the total amount of time vertices are tied by active edges in the network, divided by the amount of time they could possibly be tied.

For all the dyads which are ever observed to have edges, what fraction of the time are they tied?

tEdgeDensity(base)
[1] 0.1996973

So for the base network, if an edge is observed, it is likely to be active 19% of the time. This may be useful, but it doesn't really tell us how connected the overall network is because we could get this value for a network with only a single edge that is active for 19% of the observational period. We can change the aggregation unit from 'edge' to 'dyad' so that the denominator will be the total amount of time dyads could possibly be connected.

For all the possible dyads in the network, what fraction of time are they actually tied?

tEdgeDensity(base,agg.unit = 'dyad')
[1] 0.000766006

So, for the base network, if we peek at a randomly chosen dyad at randomly chosen time in the observation period there is only an 8 in 10,000 chance that they will be connected by an active edge! But of course, just like static density measures, this form of density is dependent on network size.

What fraction of time the observed edges in the example networks active?

edd<-sapply(nets,tEdgeDensity)
plot(edd,main='edge duration density',xaxt='n',xlab='networks')
text(edd,label=names(edd),pos=4)

If we look at it by events, the momentary event networks rank much higher.

How many events are there in a unit of time? (of course this depends on having sensible units of time)

eed<-sapply(nets,tEdgeDensity,mode='event')
plot(eed,main='edge event density',xaxt='n',xlab='networks')
text(eed,label=names(eed),pos=4)

Compute duration of active ties as fraction of the available dyads?

ddd<-sapply(nets,tEdgeDensity,agg.unit='dyad')
plot(ddd,main='dyad duration density',xaxt='n',xlab='networks')
text(ddd,label=names(ddd),pos=4)

Windsurfers should probably have a correction for dyad duration density, since the vertex set varies, so not not all dyads are always available.

The problem with these measures is that, like density, the values will be very, very low for real world networks so kind of hard to compare.

Measures of sequence

In addition to measures of momentary structure, dynamic or longitudinal networks also give us the ability to explore sequences or patterns of events. The package relevent (Butts 2008) includes some statistical techniques for modeling relational event sequences, and the tsna package borrows the participation shift summary statistics from it.

pShift

The function pShiftCount computes counts of dyadic turn-taking events using a typology outlined by Gibson (2003). Essentially, it looks at the sequence of events (rather than the exact timing) and counts up instances of various types of 'turn taking' speech acts. For example, two edges in sequence, 'i' talks to 'j', and then 'j' responds to 'i', is counted as one type of event, where 'i' talks to 'j' and then 'j' talks to 'k' would be another type of event. The set of events Gibson was interested in is certainly not an exhaustive list of potential sequences, but may still be an interesting approach for characterizing dynamic networks.

We can apply the the pShiftCount metric to the McFarland dataset of conversational speech acts in a classroom.

data(McFarland_cls33_10_16_96)
pShiftCount(cls33_10_16_96)
Loading required namespace: relevent
    AB-BA AB-B0 AB-BY A0-X0 A0-XA A0-XY AB-X0 AB-XA AB-XB AB-XY A0-AY AB-A0
691   247     2    45     3     2     5     4     7     8   155     0     1
    AB-AY
691    29

By default, it gives us aback a vector with the counts of each type participation shifts. After consulting the docs (?pShiftcount) we see that there is a lot of call and response 'Turn Receiving' ('AB-BA') as well as quite a bit of 'Turn Usurping' ('AB-XY') events.

Since we don't really know what these numbers mean in absolute terms, it would be interesting to see if the counts change at various segments of time. So divide the time up into 5-minute intervals and compute counts for each

sliceCounts<- lapply(seq(from = 0,to=45,by = 5),function(onset){
  pShiftCount(network.extract(cls33_10_16_96,onset,length = 5))
})
# convert to a matrix
sliceCounts<-do.call(rbind,sliceCounts)
sliceCounts
    AB-BA AB-B0 AB-BY A0-X0 A0-XA A0-XY AB-X0 AB-XA AB-XB AB-XY A0-AY AB-A0
67     10     1     1     1     0     1     0     3     0     2     0     0
134    21     0     2     1     1     3     3     0     0    11     0     1
46     23     0     4     0     0     0     0     0     1    13     0     0
47     23     0     4     0     0     0     0     3     0    15     0     0
103    31     0     5     1     0     1     1     0     1    23     0     0
71     36     0     6     0     0     0     0     1     2    24     0     0
74     37     0    11     0     0     0     0     0     0    22     0     0
76     38     0     7     0     0     0     0     0     1    26     0     0
73     25     1     5     0     1     0     0     0     3    15     0     0
    AB-AY
67     11
134     0
46      4
47      1
103     3
71      1
74      2
76      3
73      3
# plot
plot(sliceCounts[,'AB-BA'],type='b',col='blue',ylim=c(0,170),
     main='pShift counts for 5-min intervals of cls33',
     ylab='count of selected pShift',xlab='slice index')
points(sliceCounts[,'AB-AY'],type='b',col='red')
points(sliceCounts[,'AB-XA'],type='b',col='green')

Perhaps it is interesting that the 'AB-BA' call-and-response starts out low and increases for most of the class, in contrast to the where the 'AB-AY' counts? But I think we'd need to go back and look at the narrative accounts for the classroom.

Next steps

This package is far from complete. Most functions do not yet include ability to bin times necessary for working with certain types of continuous time networks. Include vertex activity functions, and correctly account for vertex activity in edge functions.

Citation

citation('tsna')

To cite package 'tsna' in publications use:

  Skye Bender-deMoll and Martina Morris (2021). tsna: Tools for
  Temporal Social Network Analysis. R package version 0.3.5.
  http://statnet.org/

A BibTeX entry for LaTeX users is

  @Manual{,
    title = {tsna: Tools for Temporal Social Network Analysis},
    author = {Skye Bender-deMoll and Martina Morris},
    year = {2021},
    note = {R package version 0.3.5},
    url = {http://statnet.org/},
  }

Bibliography

Vincenzo Nicosia, John Tang, Cecilia Mascolo, Mirco Musolesi, Giovanni Russo, and Vito Latora (2013) Graph Metrics for Temporal Networks. https://arxiv.org/pdf/1306.0493v1.pdf

Skye Bender-deMoll (2014). networkDynamicData: dynamic network datasets. R package version 0.1.0. https://CRAN.R-project.org/package=networkDynamicData

B. Bui Xuan, Afonso Ferreira, Aubin Jarry. "Computing shortest, fastest, and foremost journeys in dynamic networks". RR-4589, 2002. https://hal.inria.fr/inria-00071996/document

Carter T. Butts, Ayn Leslie-Cook, Pavel N. Krivitsky and Skye Bender-deMoll (2015). networkDynamic: Dynamic Extensions for Network Objects. R package version 0.8. https://statnet.org

Carter T. Butts (2008). A Relational Event Framework for Social Action. Sociological Methodology, 38(1), 155-200.

Carter T. Butts (2014). sna: Tools for Social Network Analysis. R package version 2.3-2. https://CRAN.R-project.org/package=sna

Gibson, D.R. (2003) 'Participation Shifts: Order and Differentiation in Group Conversation' Social Forces 81 (4): 1335-1380 https://academic.oup.com/sf/article-abstract/81/4/1335/2234586

Handcock M, Hunter D, Butts C, Goodreau S, Krivitsky P and Morris M (2015). ergm: Fit, Simulate and Diagnose Exponential-Family Models for Networks. The Statnet Project (<URL: https://statnet.org>). R package version 3.4.0, <URL: https://CRAN.R-project.org/package=ergm>.

Moody, James. (2002) "The Importance of Relationship Timing for Diffusion." Social Forces 81:25-56

Moody, James (2008) "Static Representations of Dynamic Networks" Duke Population Research Institute On-line Working Paper Series. http://www.soc.duke.edu/~jmoody77/StatDyn_5.pdf