第 90 章 社会网络分析

本章通过tidygraph宏包介绍社会网络分析。社会网络分析涉及的知识比较多,而tidygraph将网络结构规整地比较清晰,降低了学习难度,很适合入门学习。

90.1 图论基本知识

网络图有两个主要特征: nodes and edges

  • nodes:

  • edges:

当然还包括其它的概念,比如

  • adjacency matrix:

  • edge list:

  • Node list:

  • Weighted network graph:

  • Directed and undirected network graph:

有向图

无向图

90.2 网络分析

先介绍tidygraph宏包

90.2.1 tidygraph: A tidy API for graph manipulation

90.2.2 Tidy Network Anaylsis

  • tidygraph 框架, 网络数据可以分解成两个tidy数据框:
    • 一个是 node data
    • 一个是 edge data
  • tidygraph 宏包提供了node数据框和edge数据框相互切换的方案,并且可以使用dplyr的语法操控
  • tidygraph 提供了常用的网络结构的algorithms,比如,计算网络拓扑结构中节点的重要性、中心度等。

90.2.3 Create network objects

创建网络对象主要有两个函数:

  • tbl_graph(). Creates a network object from nodes and edges data
  • as_tbl_graph(). Converts network data and objects to a tbl_graph network.

案例: 欧盟总统之间通话以及次数。

library("navdata") # devtools::install_github("kassambara/navdata")
data("phone.call2")
node_list <- phone.call2$nodes
node_list
## # A tibble: 16 × 2
##       id label         
##    <int> <chr>         
##  1     1 France        
##  2     2 Belgium       
##  3     3 Germany       
##  4     4 Danemark      
##  5     5 Croatia       
##  6     6 Slovenia      
##  7     7 Hungary       
##  8     8 Spain         
##  9     9 Italy         
## 10    10 Netherlands   
## 11    11 UK            
## 12    12 Austria       
## 13    13 Poland        
## 14    14 Switzerland   
## 15    15 Czech republic
## 16    16 Slovania
edge_list <- phone.call2$edges
edge_list
## # A tibble: 18 × 3
##     from    to weight
##    <int> <int>  <dbl>
##  1     1     3    9  
##  2     2     1    4  
##  3     1     8    3  
##  4     1     9    4  
##  5     1    10    2  
##  6     1    11    3  
##  7     3    12    2  
##  8     3    13    2  
##  9     2     3    3  
## 10     3    14    2  
## 11     3    15    2  
## 12     3    10    2  
## 13     4     3    2  
## 14     5     3    2  
## 15     5    16    2  
## 16     5     7    2  
## 17     6     3    2  
## 18     7    16    2.5

90.2.4 Use tbl_graph

  • Create a tbl_graph network object using the phone call data:
phone.net <- tbl_graph(nodes = node_list, edges = edge_list, directed = TRUE)
  • Visualize the network graph
ggraph(phone.net, layout = "graphopt") +
  geom_edge_link(width = 1, colour = "lightgray") +
  geom_node_point(size = 4, colour = "red") +
  geom_node_text(aes(label = label), repel = TRUE) +
  theme_graph()

90.2.5 Use as_tbl_graph

mtcars data set: R 的内置数据集,记录了32种不同品牌的轿车的的11个属性

1、we create a correlation matrix network graph

library(corrr)
res.cor <- datasets::mtcars[, c(1, 3:6)] %>% # (1)
  t() %>%
  corrr::correlate() %>% # (2)
  corrr::shave(upper = TRUE) %>% # (3)
  corrr::stretch(na.rm = TRUE) %>% # (4)
  dplyr::filter(r >= 0.998) # (5)
res.cor

2、Create the correlation network graph:

set.seed(1)
cor.graph <- as_tbl_graph(res.cor, directed = FALSE)
ggraph(cor.graph) +
  geom_edge_link() +
  geom_node_point() +
  geom_node_text(
    aes(label = name),
    size = 3, repel = TRUE
  ) +
  theme_graph()

90.2.7 extract the current active data

cor.graph %>%
  activate(edges) %>%
  arrange(desc(r))

Note that, to extract the current active data as a tibble, you can use the function as_tibble(cor.graph).

90.3 Network graph manipulation

90.3.1 Car groups info (Number of cylinders)

# Car groups info
cars.group <- tibble(
  name = rownames(datasets::mtcars),
  cyl = as.factor(datasets::mtcars$cyl)
)
cars.group

90.3.2 Modify the nodes data:

# Modify the nodes data
cor.graph <- cor.graph %>%
  activate(nodes) %>%
  left_join(cars.group, by = "name") %>%
  rename(label = name)
cor.graph

90.3.3 Modify the edge data.

# Modify the edge data.
cor.graph <- cor.graph %>%
  activate(edges) %>%
  rename(weight = r)
cor.graph

90.3.4 Display the final modified graphs object:

cor.graph

90.3.5 Visualize the correlation network

set.seed(1)
ggraph(cor.graph) +
  geom_edge_link(aes(width = weight), alpha = 0.2) +
  scale_edge_width(range = c(0.2, 1)) +
  geom_node_point(aes(color = cyl), size = 2) +
  geom_node_text(aes(label = label), size = 3, repel = TRUE) +
  theme_graph()

90.4 Network analysis

90.4.1 Centrality

Centrality is an important concept when analyzing network graph.

The tidygraph package contains more than 10 centrality measures, prefixed with the term centrality_ :

# centrality_alpha()
# centrality_power()
# centrality_authority()
# centrality_betweenness()
# centrality_closeness()
# centrality_hub()
# centrality_degree()
# centrality_pagerank()
# centrality_eigen()
# centrality_subgraph
# centrality_edge_betweenness()

example: - use the phone call network graph ( 欧盟总统之间通话以及次数) - compute nodes centrality

set.seed(123)
phone.net %>%
  activate(nodes) %>%
  mutate(centrality = centrality_authority())
set.seed(123)
phone.net %>%
  activate(nodes) %>%
  mutate(centrality = centrality_authority()) %>%
  ggraph(layout = "graphopt") +
  geom_edge_link(width = 1, colour = "lightgray") +
  geom_node_point(aes(size = centrality, colour = centrality)) +
  geom_node_text(aes(label = label), repel = TRUE) +
  scale_color_gradient(low = "yellow", high = "red") +
  theme_graph()

90.4.2 Clustering

  • Clustering is a common operation in network analysis and it consists of grouping nodes based on the graph topology.

  • Many clustering algorithms from are available in the tidygraph package and prefixed with the term group_. These include:

    • Infomap community finding. It groups nodes by minimizing the expected description length of a random walker trajectory. R function: group_infomap()
    • Community structure detection based on edge betweenness. It groups densely connected nodes. R function: group_edge_betweenness()

example: - use the correlation network graphs (记录了32种不同品牌的轿车的的11个属性) - detect clusters or communities

set.seed(123)
cluster_mtcars <- cor.graph %>%
  activate(nodes) %>%
  mutate(community = as.factor(group_infomap()))
cluster_mtcars
cluster_mtcars %>%
  ggraph(layout = "graphopt") +
  geom_edge_link(width = 1, colour = "lightgray") +
  geom_node_point(aes(colour = community), size = 4) +
  geom_node_text(aes(label = label), repel = TRUE) +
  theme_graph()

90.4.3 More Algorithms

90.5 小结

tidybayes很聪明地将复杂的网络结构用两个数据框表征出来,node 数据框负责节点的属性,edge 数据框负责网络连接的属性,调整其中的一个数据框,另一个也会相应的调整,比如node数据框中删除一个节点,edge数据框就会自动地删除该节点的所有连接。

90.6 Network Visualization

这里主要介绍tidygraph配套的ggraph宏包,它们的作者都是同一个人。

90.6.1 ggraph: A grammar of graphics for relational data

ggraph 沿袭了ggplot2的语法规则,

cluster_mtcars %>%
  # Layout
  ggraph(layout = "graphopt") +
  # Edges
  geom_edge_link(
    width = 1,
    colour = "lightgray"
  ) +
  # Nodes
  geom_node_point(
    aes(colour = community),
    size = 4
  ) +
  geom_node_text(
    aes(label = label),
    repel = TRUE
  ) +
  theme_graph()