16 Positional analysis in networks
This tutorial covers positional analysis in networks. Positional analysis seeks to identify actors or groups of actors who occupy similar positions or roles in a network. Nodes are therefore not grouped because they share more connections with each other than with other nodes in the network, as in modularity-based community or group detection, but rather because they share similar, or even precisely the same, patterns of relations. These positions might be though of as roles - in so far as people who occupy similar positions also perform similar functions for the graph - though this need not be the case.
However, compared to many of the other strategies we have so far seen for grouping and comparing nodes, positional analysis is particularly fraught with difficulty. It relies on one being able to identify which nodes are structurally equivalent, or more specifically, which nodes have identical or similar sets of relations across all other actors in a graph. This would be easy if we just wanted to know if two nodes shared the same exact set of relations - we could just calculate how different their relations are over the other nodes in the network and identify those pairs of nodes who have precisely the relations - but this exact form of equivalence is, in the real world, quite rare. In practice, we are interested in a more generalized notion of structural equivalence, one in which nodes are seen as sharing a position or role not when they have relations with the exact same people, but rather when they share patterns of relations that are generally similar.
There are a number of strategies for identifying roles or positions in a network. Here, we cover three such strategies: 1) CONCOR, which evaluates approximate structural equivalence to identify positions, 2) stochastic block modeling, which uses a more relaxed notion of stochastic equivalence to identify positions, and 3) feature-based strategies for learning roles in a network.
Block modeling of this sort is not implemented in igraph, so for much of this tutorial, we will have to rely on our own code.
16.1 Toy data
Let’s make a toy network, where structurally equivalent nodes are easy to identify by eye, against which we can compare the different methods.
library(igraph)
<- data.frame(Ego = c("Q", "Q", "Q", "A", "B", "C", "C", "C", "D", "D", "D"), Alter = c("P", "A", "B", "C", "D", "E", "F", "G", "H", "I", "J"))
el
<- graph.edgelist(as.matrix(el), directed = T) net
What does the network look like? Let’s plot it, and color nodes by their equivalence class.
<- layout_with_fr(net)
layout_save plot(net, vertex.color = true_class,
vertex.label.color = "white",
layout = layout_save)
16.2 Structural equivalence
Let’s begin with an exact measure of structural equivalence - two nodes will be deemed structurally equivalent if they have precisely the same set of relations across all other nodes in the network. To evaluate this, we could simply identify which nodes have exactly the same set of neighbors, or using a matrix, take the absolute difference in row values for two nodes.
We extract the adjacency matrix.
<- as.matrix(as_adj(net)) adj_mat
Here is an example for nodes Q and A.
<- adj_mat["Q", !colnames(adj_mat) %in% c("Q", "A")]
a_row <- adj_mat["A", !colnames(adj_mat) %in% c("Q", "A")]
b_row
# equivalent?
sum(abs(a_row-b_row))
## [1] 3
The sum of their absolute differences greater than 0, so we would say they are not precisely equivalent. Let’s extrapolate to every pair of nodes in the network with a function.
<- function(mat){
perfect_equivalence # make an empty version of the matrix where we will store equivalence values
<- mat
matrix_vals <- 0
matrix_vals[] # loop over the actors in the network, comparing pair-wise their values
for(i in 1:nrow(mat)){
for(j in 1:nrow(mat)){
<- mat[i, c(-i, -j)]
a_row <- mat[j, c(-i, -j)]
b_row <- sum(abs(a_row-b_row)) # take sum of absolute differences
abs_diff <- abs_diff
matrix_vals[i,j]
}
}# return results
return(matrix_vals)
}
Now we apply the function to our network/matrix. What do you see?
<- perfect_equivalence(adj_mat)
structurally_equivalent structurally_equivalent
## Q P A B C D E F G H I J
## Q 0 2 3 3 6 6 3 3 3 3 3 3
## P 2 0 1 1 3 3 0 0 0 0 0 0
## A 3 1 0 2 3 4 1 1 1 1 1 1
## B 3 1 2 0 4 3 1 1 1 1 1 1
## C 6 3 3 4 0 6 2 2 2 3 3 3
## D 6 3 4 3 6 0 3 3 3 2 2 2
## E 3 0 1 1 2 3 0 0 0 0 0 0
## F 3 0 1 1 2 3 0 0 0 0 0 0
## G 3 0 1 1 2 3 0 0 0 0 0 0
## H 3 0 1 1 3 2 0 0 0 0 0 0
## I 3 0 1 1 3 2 0 0 0 0 0 0
## J 3 0 1 1 3 2 0 0 0 0 0 0
Let’s convert this distance matrix to a similarity matrix.
<- 1-(structurally_equivalent/max(structurally_equivalent)) structurally_equivalent_sim
We can now cluster this similarity matrix to identify sets of similar actors using k-means, for example. If we set the number of clusters equal to 6, it will pull out the set of actors who have precisely the same relations. If we set it equal to 5 or 4, it will show us actors who are proximately similar.
<- kmeans(structurally_equivalent_sim, centers = 6)
group_ids_6 <- kmeans(structurally_equivalent_sim, centers = 5) group_ids_5
How close do we get?
plot(net,
vertex.color = group_ids_5$cluster,
layout = layout_save)
plot(net,
vertex.color = group_ids_6$cluster,
layout = layout_save)
Close, but there is a problem. This strategy appears to treat C and D as occupying different positions. Why is that? Well, it has to do with our strict definition of structural equivalence - which relies on comparing the precise set of actors that each node is connected to. As a result, our method is confusing similarity with closeness, and as a result, it turns out, nodes that we deem structurally equivalent can never be more than two steps away from one another.
16.3 Block Modeling with CONCOR
Before we deal with our definitional problem, let’s work through the CONCOR algorithm, introduced by Breiger et al. and made use of by White, Boorman, and Breiger in their seminal paper, Social Structure from Multiple Networks. I. Blockmodels of Roles and Positions. This method relies a similar definition of equivalence as the one outlined above - it identifies structurally equivalent nodes through correlating their relation sets - but because it uses correlation, and implements the stacking of matrices, it allows one to produce a block model of multiple relations simultaneously.
CONCOR measures structural equivalence using correlation (that is, we measure the similarity between people in terms of the pattern of their relations). The cor function measures correlation.
<- as.matrix(get.adjacency(net))
net_mat = cor(net_mat) net_cor
## Warning in cor(net_mat): the standard deviation is zero
<- ifelse(is.na(net_cor[]), 0, net_cor[]) net_cor[]
The key insight of CONCOR is that, by repeatedly running correlation on the results of this initial correlation, the data will eventually converge to only -1s and 1s. Let’s try 100 times and see how it goes.
<- net_cor
net_concor for(i in 1:200){
<- cor(net_concor)
net_concor }
How does it look? It appears to have converged!
range(net_concor)
## [1] -1 1
Now what? First, we identify people who have 1s vs. -1s and group them together. These are our initial blocks, for a 2 block solution.
<- net_concor[, 1] > 0 group
We can split the original data into each of the respective groups.
<- list(net_mat[, names(group[group])], net_mat[, names(group[!group])]) split_results
Now, if we want we can run the same thing above again, on each of the groups.
<- function(x, times = 1000){
cor_many_times for(i in 1:times){
<- cor(x)
x if(sd(x, na.rm = T) == 0){
return(x)
else {
} <- ifelse(is.na(x[]), 0, x[])
x[]
}
}return(x)
}
<- lapply(split_results, cor_many_times) split_results_corred
## Warning in cor(x): the standard deviation is zero
<- lapply(split_results_corred, function(x) x[, 1] > 0)
groups_2
<- lapply(groups_2,
split_results_again function(x) list(net_mat[, names(x[x])], net_mat[, names(x[!x])]))
<- unlist(split_results_again, recursive = F)
split_results_again
<- lapply(split_results_again, colnames)
final_blocks final_blocks
## [[1]]
## [1] "Q" "C" "D" "H" "I" "J"
##
## [[2]]
## [1] "P" "A" "B"
##
## [[3]]
## [1] "E" "F" "G"
##
## [[4]]
## NULL
We can also plot the result using the blockmodel() function from the sna package (part of statnet).
<- lapply(1:length(final_blocks), function(x) rep(x, length(final_blocks[[x]])))
clusters <- unlist(clusters)
clusters names(clusters) = unlist(final_blocks)
<- clusters[colnames(net_mat)]
clusters
= sna::blockmodel(net_mat,
all_output
clusters, glabels = "Feudal Network",
plabels = colnames(net_mat))
plot(all_output)
Here is the plot of the graph, where nodes are colored by position.
plot(net, vertex.color = clusters)
Not very good!
16.4 Isomorphic local graphs
CONCOR allows us to apply our structural equivalence routine across many relations simultaneously, but it doesn’t quite solve the issue of how to identify actors who are precise structural equivalents. First, we could try to relax the condition that nodes be tied to precisely the same set of nodes by defining them as structurally equivalent as long as their local neighborhoods (set to include nodes n steps away) are automorphic. The underlying algorithm, bliss, essentially permutes the matrices of the two networks it is comparing to see if, under any of the different permutations, the two matrices are equivalent. Since the matrices are being permuted, we are ignoring node labels (i.e. node ids) and focusing instead on the structure of their relations.
First, we select neighborhood size. For now, we set it to 2.
= 2 steps
Next, we extract each node’s 2-step neighborhood using igraph’s handy make_ego_graph() function.
<- make_ego_graph(net, order = steps) local_graphs
Then, we loop through the different local neighborhoods and evaluate whether they are automorphic, saving the result in a matrix. We convert the matrix (which is logical) to a numeric matrix (so that TRUE = 1 and FALSE = 0), and set the diagonal to 0.
<- matrix(FALSE, nrow = length(local_graphs), ncol = length(local_graphs))
iso_mat for(i in 1:length(local_graphs)){
for(j in 1:length(local_graphs)){
<- isomorphic(local_graphs[[i]], local_graphs[[j]])
iso_mat[i,j]
}
}
<- as.numeric(iso_mat[])
iso_mat[] diag(iso_mat) = 0
We then cluster the resulting matrix in order to identify nodes who share isomorphic neighborhoods, using a network clustering strategy.
<- cluster_louvain(graph.adjacency(iso_mat, mode = "undirected")) clusters_iso
Let’s plot the result.
plot(net,
vertex.color = membership(clusters_iso),
layout = layout_save )
Wow, it worked! By relaxing the assumption that structurally equivalent nodes are tied to the same precise set of nodes, we get better results.. So why not just this strategy all the time?
There are a couple of reasons. First, evaluating whether two neighborhoods are isomorphic is easy and quick when neighborhoods are small, but very soon becomes computationally costly (prohibitively so) as they grow. Also, this strategy is highly sensitive to missing data - if one of edge is missing or randomly rewired, then the two neighborhoods won’t be viewed as isomorphic.
So.. let’s relax our assumptions even further.
16.5 Stochastic Block Models (SBMs)
Now that we can identify blocks using featurs, CONCOR and a relatively strict measure of structural equivalence, let’s try to generalize our measure of equivalence even further. One such generalization is that rather than sharing the same set of nodes, actors that share a role or position will have the same probability of being attached to all other alters in the network. Under this operationalization, equivalence is not absolute, but probabilistic (and hence, stochastic).
Stochastic block models are quite difficult to program, but thankfully, there is already an R package which can fit them to data, blockmodels. Let’s install it.
install.packages("blockmodels")
And load it into R.
library(blockmodels)
Here we run an SBM on just the marriage network.
<- BM_bernoulli("SBM",
sbm_out
net_mat, verbosity = 3,
plotting = "",
exploration_factor = 5) # run a bernoulli block model on the feudal matrix.
# bernoulli is for when your edge weights are binary
# poisson is for when your edge weights are counts
# gaussian (i.e. normal distribution) is for when your edge weights are continuous variables
# estimate the result
$estimate() sbm_out
## -> Estimation for 1 groups
##
-> Computation of eigen decomposition used for initalizations
##
## -> Pass 1
## -> With ascending number of groups
## -> For 2 groups
##
-> For 3 groups
##
-> For 4 groups
##
-> For 5 groups
##
-> With descending number of groups
## -> For 4 groups
##
-> For 3 groups
##
-> For 2 groups
##
-> Pass 2
## -> With ascending number of groups
## -> For 2 groups
## -> For 3 groups
## -> For 4 groups
## -> For 5 groups
##
-> With descending number of groups
## -> For 4 groups
## -> For 3 groups
## -> For 2 groups
## -> Pass 3
## -> With ascending number of groups
## -> For 2 groups
## -> For 3 groups
## -> For 4 groups
## -> For 5 groups
## -> With descending number of groups
## -> For 4 groups
## -> For 3 groups
## -> For 2 groups
Now we can extract the role assignments (which are probabilistic).
<- sbm_out$memberships[[which.min(sbm_out$ICL)]] # extract the fit which is best according to ICL (Integrated Completed Likelihood), a measure for selecting the best model
best_fit_assignments head(best_fit_assignments$Z) # probabilities of belonging to each group
## [,1] [,2] [,3] [,4] [,5]
## [1,] 0.008130081 0.008130081 0.008130081 0.008130081 0.967479675
## [2,] 0.008180839 0.008180839 0.948836318 0.026621164 0.008180839
## [3,] 0.008130081 0.008130081 0.967479675 0.008130081 0.008130081
## [4,] 0.008130081 0.008130081 0.967479675 0.008130081 0.008130081
## [5,] 0.967479675 0.008130081 0.008130081 0.008130081 0.008130081
## [6,] 0.967479675 0.008130081 0.008130081 0.008130081 0.008130081
<- apply(best_fit_assignments$Z, 1, which.max) # identify which column as the highest value for each node (row) class_assignments
And we can plot the results, just like before, using the blockmodel function from sna
= sna::blockmodel(net_mat,
sbm_output
class_assignments, glabels = "Feudal",
plabels = colnames(net_mat))
plot(sbm_output)
plot(net, vertex.color = class_assignments)
Pretty bad too… it seems to just be picking up degree (or something close to it).
16.6 Feature-based strategies
Finally, an idea which is currently being used in computer science is to capture features for nodes to classify them into kinds of roles and positions depending on that feature set. For example, we could measure nodes’ centralities, the motifs in their local neighborhoods (i.e. triangles/MAN), their average distance from other nodes, etc. and then cluster them according to those features. In a sense, this assumes the least about the data and also is most feasible for large graphs (since such features are usually computationally trivial to measure). Here is a quick example:
# this function limits the network to a given node's neighborhood (defined n-steps away) and then runs the triad census on that neighborhood.
<- function(net, vertex_name, steps = 2){
local_man <- ego(net, order = steps, vertex_name, mode = "out")[[1]]
n_steps <- igraph::delete.vertices(net, !V(net) %in% n_steps)
subnet <- igraph::triad.census(subnet)
local_triad_count return(local_triad_count)
}
<- lapply(V(net)$name, FUN = function(x) local_man(net, x, steps = 1))
onestep_triad_counts <- lapply(V(net)$name, FUN = function(x) local_man(net, x, steps = 2))
twostep_triad_counts <- lapply(V(net)$name, FUN = function(x) local_man(net, x, steps = 3))
threestep_triad_counts
<- do.call("rbind", onestep_triad_counts)
onestep_triad_counts <- do.call("rbind", twostep_triad_counts)
twostep_triad_counts <- do.call("rbind", threestep_triad_counts)
threestep_triad_counts
# different centrality measures
<- igraph::degree(net, mode = "in")
indegrees <- igraph::degree(net, mode = "out")
outdegrees
# average distance with other nodes
<- rowMeans(distances(net))
average_distances
# put them all together
<- cbind(onestep_triad_counts, twostep_triad_counts, threestep_triad_counts, indegrees, outdegrees, average_distances)
graph_features
# run kmeans on the resulting dataset
<- kmeans(graph_features, 5)$cluster
cluster_ids
# plot the results
plot(net, vertex.color = cluster_ids)
Pretty good! Of course, we knew the number of roles (k) beforehand. There are algorithms for choosing k if it is not known. This strategy can be very dependent on the set of features you include.