11 Transitivity, structural balance, and hierarchy
The main goal of this tutorial is to delve more deeply into the microfoundations of networks, such as dyads and tryads. We will learn the basic functions for measuring reciprocity, transitivity, the triad census, and for identifying cliques. We will also learn how to “ban” triads from a graph, and begin to think about whether our graphs deviate, in a statistical sense, from what we might expect by chance, one way of evaluating whether our results are “meaningful”.
Through this tutorial, we will rely on igraph to analyze the comm59 Add Health network that we made use of last class.
# read in the edge list from our github
<- read.table("https://raw.githubusercontent.com/mahoffman/stanford_networks/main/data/comm59.dat.txt", header = T)
el # Read in attributes from our github
<- read.table("https://raw.githubusercontent.com/mahoffman/stanford_networks/main/data/comm59_att.dat.txt", header = T)
attributes # add an ID column
$ID <- 1:nrow(attributes) attributes
Next let’s graph it as a network, ignoring the ranking of friendships for now.
# First read in igraph
library(igraph)
# Indexing data so that you only put in certain columns
<- el[,1:2] # We will ignore the ranking variable for now.
el_no_weight <- as.matrix(el_no_weight) # igraph requires a matrix
el_no_weight
# convert ids to characters so they are preserved as names
1] <- as.character(el_no_weight[,1])
el_no_weight[,2] <- as.character(el_no_weight[,2])
el_no_weight[,
# Graph the network
<- graph.edgelist(el_no_weight, directed = T)
net59
# Finally, add attributes
# First link vertex names to their place in the attribute dataset
<- match(V(net59)$name, attributes$ID)
linked_ids
# Then we can use that to assign a variable to each user in the network
V(net59)$race <- attributes$race[linked_ids]
V(net59)$sex <- attributes$sex[linked_ids]
V(net59)$grade <- attributes$grade[linked_ids]
V(net59)$school <- attributes$school[linked_ids]
# Great! net59
## IGRAPH d05711b DN-- 975 4160 --
## + attr: name (v/c), race (v/n), sex (v/n), grade (v/n), school (v/n)
## + edges from d05711b (vertex names):
## [1] 1 ->191 1 ->245 1 ->272 1 ->413 1 ->447 3 ->21 3 ->221 3 ->480 3 ->495
## [10] 3 ->574 5 ->96 5 ->258 5 ->335 6 ->271 6 ->374 6 ->400 6 ->489 6 ->491
## [19] 6 ->573 6 ->586 7 ->134 7 ->159 7 ->464 7 ->478 8 ->221 8 ->284 8 ->378
## [28] 8 ->557 9 ->137 9 ->442 9 ->473 9 ->498 10->20 10->22 10->64 10->75
## [37] 10->89 10->219 10->272 10->276 11->101 11->155 11->190 11->273 11->337
## [46] 11->339 11->353 11->616 12->188 12->475 14->20 14->151 14->597 15->106
## [55] 15->189 15->233 15->325 15->333 15->449 15->491 15->552 15->624 15->627
## [64] 16->30 16->201 16->217 16->267 16->268 16->466 16->569 17->625 19->45
## + ... omitted several edges
11.1 The Dyad
We can break large social networks down into their constituent parts. These constituent parts are referred to as “motifs”. The most basic motif consists of two nodes and is called a dyad. Edges in a network signify the presence or absence of dyadic relations. It follows that a dyad in an undirected network can have two unique configurations: connected or disconnected; and three unique configurations in a directed network (mutual, assymetric, and null)
Density captures, at the macro-level, the proportion of dyads that are present over the possible total number of dyads in the network. We are simply re-framing what we discussed last class, except we are focusing on the configuration of nodes as opposed to edges.
graph.density(net59)
## [1] 0.004380561
A related concept is that of reciprocity, a measure which pertains only to directed graphs. Reciprocity is the tendency with which affect, or network ties, sent out by egos are returned by alters. Edges are reciprocal when ego and alter both send each other ties; reciprocity is the graph-level analogue, evaluating the tendency for edges to be reciprocal across the whole network.
reciprocity(net59)
## [1] 0.39375
So our graph has a reciprocity score of 0.39. Is that high or low? It depends on your expectation. If you are from a society with a strong taboo against unrequited affect, then it might seem low. If you come from an individualistic society, it might seem high.
11.2 Generating a random graph for comparison
One way network scholars evaluate whether a given descriptive statistic is high or low is to compare it to the value that obtains under a random network of similar density. In random graphs, the chance that any two dyads are in a relation is determined by chance (i.e. the flip of a coin). This means that the likelihood of observing a given tie is independent from observing a tie between any other dyad.
There are a lot of reasons that this is unrealistic.. For example, if A and B are friends and A and C are friends, then we would expect the probability that B and C are friends to be higher. That is, you are more likely to be friends with your friends friends than with strangers. This is a basic feature of most social networks and one that was emphasized in our readings this week on transitivity. Even in dyads, if someone shows you affection you are more likely to return it. We therefore might want to factor in this human tendency towards triadic and dyadic closure into our null model - later in the class we will discuss ways of doing this. That said, random graphs have many properties that are mathematically and heuristically useful, which is why they are commonly used as null models.
igraph has a fast and easy function for generating random graphs.
?erdos.renyi.game
In an erdos.renyi.graph, each edge has the same probability of being created. We determine the probability and it returns a random graph with a density that equals (in expectation) this probability.
First, we need to calculate the density and number of nodes in our graph.
<- vcount(net59)
net59_n <- graph.density(net59) net59_density
Then we can input those into the erdos.renyi.game function provided by igraph to generate a network of the same size and density, but with edges that are random re-arranged.
<- erdos.renyi.game(n = net59_n, p.or.m = net59_density, directed = TRUE) # where n is the number of nodes, p.or.m is the probability of drawing an edge, directed is whether the network is directed or not random_graph
Let’s take a look at the graph.
plot(random_graph,
vertex.size = 2,
vertex.label = NA,
edge.curved = .1,
vertex.color = "tomato",
edge.arrow.size = .1,
edge.width = .5,
edge.color = "grey60")
It looks like a bowl of spaghetti - that is how you know you indeed have a random graph on your hands. What is its reciprocity?
reciprocity(random_graph)
## [1] 0.003703704
0.003! So our network exhibits far more reciprocity than we would expect if people were affiliating randomly.
11.3 The Triad
Triads consist of three nodes and are therefore more complex than dyads, with more possible arrangements.
This becomes clear when you realize an undirected triad consists of three dyads: A and B, B and C, and A and C.
In an undirected graph, there are eight possible triads (2^3… sort of obvious given that each dyad can be present or absent and there are three dyads). Of those eight possible triads, four are isomorphic, so that there are four unique triads.
In an directed graph, there will be sixteen unique triads (the unique number of motifs in directed graphs is not easily reduced to a functional form). These sixteen unique triads give rise to the MAN framework as well as the triad census, which we discussed in class.
The triad census calculates how many triads there are of each type (which, as I just mentioned, in a directed network amounts to 16). If we see a network with very few complete (003) triads, then we know something about the macro-level structure, just by looking at the frequencies of its constituent parts at the micro-level. By extension, if the whole distribution of triads is very different than the distribution that obtains under a random network, then we hopefully learn something about the macro-level structure that we couldn’t observe just by looking at a visualization of the network.
11.4 Calculating a triad census
igraph has a built in function for the triad census: triad.census()
It takes a network object as an argument and returns the number of each type of triad in the network.
triad.census(net59)
## [1] 150776065 2416731 785249 3621 4325 5165 4028
## [8] 3619 450 23 1019 294 320 137
## [15] 385 144
As you can see, it returns 16 different numbers. It uses the M-A-N classification structure: M stands for the number of Mutual ties, A stands for the number of Asymmetric ties, and N stands for the number of Null ties.
Mutual means that ego and alter (say A and B) have a mutual relation with each other (A likes B and B likes A). A means that ego and alter have an asymmetric relation with one another, i.e. A likes B but B doesn’t like A… the relation is not reciprocated. Finally, N means that A and B do not have any relation.
The image below visualizes the different types of triads possible in a directed graph.
?triad.census also describes these possible types, telling you the order that the triad types are presented in the triad.census() output.
003 for example means there are 0 mutual relation, 0 asymmetric relations and 3 null relations. This triad contains no relations. 201 would mean there are two mutual relations, zero asymmetric relations, and 1 null relation. Of three dyads in the triad, two dyads are in a relation, and one dyad is not.
For now let’s look at the triad census of our random graph.
triad.census(random_graph)
## [1] 149843876 4112275 7659 9519 9363 18647 67
## [8] 57 86 25 0 0 0 1
## [15] 0 0
Most triads are null. This is because edges only have a 5% chance of being drawn. It follows that 95% of the total possible edges are missing.
There are not many cases of transitive triads, or even in which all three dyads have a relationship. Thus, the majority of triads are concentrated in the left side of the triad distribution
Let’s compare it our Add Health network.
triad.census(net59)
## [1] 150776065 2416731 785249 3621 4325 5165 4028
## [8] 3619 450 23 1019 294 320 137
## [15] 385 144
Which triads are common in our network, but not in the random graph? What might this tell us?
11.5 Random graphs galore!
One problem is that we are basing our analysis on a single random graph. Because edges are drawn randomly, there is a lot of variation in the structure of random graphs, especially when the number of nodes in the graph is small (less than one thousand).
What we really want is a distribution of random graphs and their triad censuses, against which our own could be compared. So let’s generate one hundred random graphs, and create a distribution of random graph triad censuses and see where our graph lies on that distribution
<- vector("list", 100) # this creates a list with 100 spaces to store things. We will store each result here.
trial
for ( i in 1:length(trial) ){
<- erdos.renyi.game(n = net59_n, p.or.m = net59_density, directed = TRUE)
random_graph <- triad.census(random_graph) # We assign to the ith space the result. So for the first iteration, it will assign the result to the first space in the list
trial[[i]]
}
<- do.call("rbind", trial) # We can use the do.call and "rbind" functions together to combine all of the results into a matrix, where each row is one of our trials
trial_df
colnames(trial_df) <- c("003", "012", "102", "021D", "021U", "021C", "111D", "111U", "030T", "030C", "201", "120D", "120U", "120C", "210", "300") # It is worth naming the columns too.
<- rbind(trial_df, as.numeric(triad.census(net59))) # add in the observed results trial_df_w_observed
Now we have this 100 row dataset of simulation results with the observed results tacked on at the end (of course, we could have done 1,000 or 10,000 iterations!) Let’s produce, for each column, some simple statistics, like a mean and a confidence interval.
# First, standardize all of the columns by dividing each of their values by the largest value in that column, so that each will be on a similar scale (0 to 1), we can visualize them meaningfully
<- as.data.frame(trial_df_w_observed)
trial_df_w_observed
1:ncol(trial_df_w_observed)] <- sapply(trial_df_w_observed[,1:length(trial_df_w_observed)], function(x) x/max(x))
trial_df_w_observed[,
# Then split the observed from the simulation results
<- as.data.frame(trial_df_w_observed[1:100,])
trial_df <- as.numeric(trial_df_w_observed[101,])
observed
# Summarize the simulation results and add the observed data set back in for comparison
<- data.frame(TriadType = colnames(trial_df),
summarized_stats Means = sapply(trial_df, mean),
LowerCI = sapply(trial_df, function(x) quantile(x, 0.05)),
UpperCI = sapply(trial_df, function(x) quantile(x, 0.95)),
Observed = observed)
summarized_stats
## TriadType Means LowerCI UpperCI Observed
## 003 003 0.9949328855 0.994278750 0.9955706706 1.0000000
## 012 012 0.9608529038 0.938119928 0.9853509927 0.5885101
## 102 102 0.0113484895 0.006088451 0.0170609577 1.0000000
## 021D 021D 0.9154071799 0.869776554 0.9634861802 0.3834586
## 021U 021U 0.9270289079 0.880053533 0.9740256959 0.4630621
## 021C 021C 0.9239931522 0.880662850 0.9680933019 0.2763214
## 111D 111D 0.0196573982 0.010166336 0.0298659384 1.0000000
## 111U 111U 0.0212683062 0.008565902 0.0326471401 1.0000000
## 030T 030T 0.1702000000 0.144333333 0.2090000000 1.0000000
## 030C 030C 0.6840540541 0.458108108 0.9472972973 0.6216216
## 201 201 0.0001079490 0.000000000 0.0009813543 1.0000000
## 120D 120D 0.0005782313 0.000000000 0.0034013605 1.0000000
## 120U 120U 0.0004062500 0.000000000 0.0031250000 1.0000000
## 120C 120C 0.0021897810 0.000000000 0.0072992701 1.0000000
## 210 210 0.0000000000 0.000000000 0.0000000000 1.0000000
## 300 300 0.0000000000 0.000000000 0.0000000000 1.0000000
Now that we have this dataset, we can use ggplot to plot it.
library(ggplot2)
ggplot(summarized_stats) +
geom_point(aes(x=TriadType, y=Means, colour=TriadType)) +
geom_errorbar(aes(x = TriadType, ymin=LowerCI, ymax=UpperCI, colour=TriadType), width=.1) +
geom_point(aes(x=TriadType, y=Observed, colour="Observed")) +
coord_flip()
Beautiful! What do you see? Which triads are more or less common in our graph than in a random graph? What might this tell us about the macro-structure?
11.6 Producing a tau statistic
Given a weighting scheme, which values some triads and not others, we can evaluate whether our network fits a macrolevel model of our choice. For example, below is a weighting scheme for the ranked-clustering weighting scheme, drawn from Daniel A. McFarland, et al. “Network Ecology and Adolescent Social Structure”. We can compare this to that produced by a random graph to see if our tau count is more or less than we should expect by chance.
<- c(0,0,0,1,1,-1,0,0,1,0,0,1,1,0,0,0)
weighting_scheme
sum(triad.census(net59) * weighting_scheme)
## [1] 3845
11.7 Banning triads
What if we want to ban some triads, but allow others? How might we go about building a simulation that does that? Below is an example. The basic idea is to sample nodes, draw an edge randomly with some probability (i.e. to control density), and then choose to keep or delete that edge, depending on if the triangles that it produces are allowed or not.
# A basic function which prevents the formation of specified triads in a random graph simulation
= function(n = 100, porm = .05, banned = c(2), sim_max = 1000000, probrecip = .5){
banning_triads_game
require(igraph) # Ensures igraph is loaded
if(any(c(1) %in% banned)){
stop("Can't ban 003s") # Stops the simulation if the user tried to bad 003 or 012 triads
}
= round(n*(n-1)*porm, 2) # calculates the desired number of edges according to the N and Porm parameters
num_edges
= make_empty_graph(n = n, directed = TRUE) # initializes an empty network
net
= 0
edge_count = 0
sim_count
while(edge_count < num_edges){ # Begins a loop, which ends once the requisite number of edges is reached.
# This part samples two nodes, checks whether the two sampled nodes are the same node, and whether an edge is already present in the network between these nodes
= TRUE
uniq = TRUE
edge_present while(uniq == TRUE | edge_present == TRUE){
= sample(1:n, 2, replace = T)
edge_id = edge_id[1] == edge_id[2]
uniq = sample(c(FALSE, TRUE), 1, prob = c(1-probrecip, probrecip))
reciprocated = are.connected(net, edge_id[1], edge_id[2])
edge_present_1 if (reciprocated){
= are.connected(net, edge_id[2], edge_id[1])
edge_present_2 = edge_present_1|edge_present_2
edge_present else {
} = edge_present_1
edge_present
}
}
# Calculates the traid census for the network before adding an edge
= triad.census(net)
before = net + edge(edge_id) # Adds in the edge
net_new if(reciprocated){
= edge_id[2:1]
edge_id_rev = net_new + edge(edge_id_rev) # Adds in the edge
net_new
}= triad.census(net_new) # Calculates the triad census again
after = after - before # Checks to see how much the triad census changed
triad_diff
if(all(triad_diff[banned] == 0)){
= net_new # If the banned triads still aren't observed, then the new network is accepted.
net
}
= ecount(net) # number of edges updated
edge_count = sim_count + 1 # Simulation count updated
sim_count if(sim_count > sim_max){
print("Warning: Failed to converge, banned triads may be incompatible") # exits simulation if simulation max count is exceeded
return(net)
}
}return(net) # Returns the simulated network
}
Cool - let’s try it. Here, we ban three triad types.
= banning_triads_game(banned = c(4,5,7))
no_cycles triad_census(no_cycles)
## [1] 137539 2682 20010 0 0 9 0 123 0 0
## [11] 1291 0 2 3 9 32
plot(no_cycles, vertex.size = 2, vertex.label = NA, vertex.color = "tomato", edge.arrow.size = .2)