Indexes | Formula | Note | Description |
---|---|---|---|
Connectivity/ Degree (centrality) | $k_i=\sum_{j\neq i}a_{ij}$ | πππ is the connection strength between nodes i and j. when πππ=1, ki is the unweighted degree | It is also called node degree. It is the most commonly used concept for describing the topological property of a node in a network. |
Betweenness centrality | $B_i=\sum_{j,k}\frac{\sigma(i,j,k)}{\sigma(j,k)}$ | π(π,π,π) is the number of shortest paths between nodes j and k that pass through node i. π(π,π) is the total number of shortest paths between j and k. | It is used to describe the ratio of paths that pass through the ith node. High Betweenness node can serve as a broker similar to stress centrality. |
Closeness centrality | $Ci=1/\sum_{i\neq j}d_{ij}$ | The closeness centrality of a vertex is defined as the inverse of the sum of distances to all the other vertices in the graph. dij is the shortest distances from node i to j. | Closeness centrality measures how many steps is required to access every other vertex from a given vertex. |
Eigenvector centrality | $EC_i=\frac{1}{\lambda}\sum_{j\in M(i)}EC_j$ | M(π) is the set of nodes that are connected to the ith node and Ξ» is a constant eigenvalue. | It is used to describe the degree of a central node that it is connected to other central nodes. |
Clustering coefficient | $CCo_i=\frac{2l_i}{k_i'(k_i'-1)}$ | li is the number of links between neighbors of node i and k i β is the number of neighbors of node i. | It describe how well a node is connected with its neighbors. If it is fully connected to its neighbors, the clustering coefficient is 1. A value close to 0 means that there are hardly any connections with its neighbors. It was used to describe hierarchical properties of networks. |
Eccentricity | $E_i=\max_{j\neq i}(d_{ij})$ | dij is the shortest distance from node i to node j | The eccentricity of a vertex is its shortest path distance from the farthest other node in the graph. |
Page.rank | $PR_i=\sum_{j\in B_i}\frac{PR_j}{l_j}$ | i is the node whose pr value needs to be calculated, and Bi is the set of all nodes pointing to node i. PRj is the pr value of node j and lj is the number of links between neighbors of node j. | Calculates the Google PageRank for the specified vertices. PageRank, or webpage ranking, also known as webpage level, is an indicator to measure the importance of webpages. |
Kleinberg's hub and authority centrality | $HC=\lambda_{AA^T}$ $AC=\lambda_{A^TA}$ | The hub scores of the vertices are defined as the principal eigenvector of AAT, the authority scores of the vertices are defined as the principal eigenvector of ATA. where A is the adjacency matrix of the graph. | A node is an authority if it is linked to by hubs; it is a hub if it links to authorities. |
5 Topology
5.1 Topological indexes
There are lots of topological indexes for network analysis, we collected many indexes often used for biological research.
The first part is topological indexes for individual nodes.
The second part is the overall network topological indexes.
Indexes | Formula | Note | Description |
---|---|---|---|
Average connectivity/ degree | $\overline{k}=\frac{\sum_{i=1}^{n}k_i}{n}$ | k i is degree of node i and n is the number of nodes. | Higher avgK means a more complex network. |
Average path length/ Average geodesic distance | $L=\frac{\sum_{i \ne j}d_{ij}}{n(n-1)}$ | dij is the shortest path between node i and j. | A smaller GD means all the nodes in the network are closer. |
global efficiency/ Geodesic efficiency | $E_g=\frac{\sum_{i \ne j}1/d_{ij}}{n(n-1)}$ | all parameters shown above. | It is the opposite of GD. A higher E means that the nodes are closer. |
Centralization of degree | $CD=\sum_{i=1}^{n}(\max(k)-k_i)$ | max(k) is the maximal value of all connectivity values and k i represents the connectivity of ith node. Finally this value is normalized by the theoretical maximum centralization score. | It is close to 1 for a network with star topology and in contrast close to 0 for a network where each node has the same connectivity. |
Centralization of betweenness | $CB=\sum_{i=1}^{n}(\max(B)-B_i)$ | max(B) is the maximal value of all betweenness values and B i represents the betweenness of ith node. Finally this value is normalized by the theoretical maximum centralization score. | It is close to 0 for a network where each node has the same betweenness, and the bigger the more difference among all betweenness values. |
Centralization of closeness | $CC=\sum_{i=1}^{n}(\max(C)-C_i)$ | max(C) is the maximal value of all closeness values and Ci represents the closeness of ith node. Finally this value is normalized by the theoretical maximum centralization score. | It is close to 0 for a network where each node has the same closeness, and the bigger the more difference among all closeness values. |
Centralization of eigenvector centrality | $CE=\sum_{i=1}^{n}(\max(EC)-EC_i)$ | max(EC) is the maximal value of all eigenvector centrality values and EC i represents the eigenvector centrality of ith node. Finally this value is normalized by the theoretical maximum centralization score. | It is close to 0 for a network where each node has the same eigenvector centrality, and the bigger the more difference among all eigenvector centrality values. |
Density | $D=\frac{2l}{n(n-1)}$ | l is the sum of total links. | The density of a graph is the ratio of the number of edges and the number of possible edges. It is closely related to the average connectivity. |
Average clustering coefficient | $\overline{CCo}=\frac{\sum_{i=1}^{n}CCo_i}{n}$ | πΆπΆoi is the clustering coefficient of node i. | It is used to measure the extent of module structure present in a network. |
Transitivity | $Tr=\frac{\sum_{i=1}^{n}2l_i}{\sum_{i=1}^{n}(k'_i)(k'_i-1)}$ | li is the number of links between neighbors of node i and k i β is the number of neighbors of node i. | Sometimes it is also called the entire clustering coefficient. It has been shown to be a key structural property in social networks. |
Natural connectivity | $NC=\ln{\left(\frac{1}{N}\sum_{i=1}^{N}e^{\lambda_i}\right)}$ | Where N is nodes number of the network, represents the eigenvalue of the network adjacency matrix. |
Topological indexes for edges are less, so we do not list them here.
We can get all these indexes easily by using net_par()
. you can choose different mode (n,v,e) to get indexes of network, vertexes or edges especially or just use βallβ to get a list contains all indexes.
make_graph("Walther") %>% net_par(mode = "all")
make_graph("Walther") %>% net_par(mode = "n")
## $n_index
## name Node_number Edge_number Edge_density Negative_percentage
## 1 Walther 25 31 0.1033333 NA
## Average_path_length Global_efficiency Average_degree Average_weighted_degree
## 1 3.586667 0.3679762 2.48 2.48
## Diameter Clustering_coefficent Centralized_betweenness Natural_connectivity
## 1 8 0 0.1436084 1.029526
##
## $v_index
## NULL
##
## $e_index
## NULL
Or you can use c_net_index()
to calculate topological indexes and store in the network itself, which is very useful if you need to do following visualization.
# get a metanet
go <- erdos.renyi.game(30, 0.25)
go <- c_net_update(go)
par(mfrow = c(1, 2))
plot(go, vertex_size_range = c(5, 20), legend = F, main = "Same size")
# calculate topological indexes and store
go <- c_net_index(go)
head(get_v(go))
## name label size v_group shape v_class color Degree Clustering_coefficent
## 1 1 1 1 v_group1 circle v_class1 #a6bce3 8 0.2857143
## 2 2 2 1 v_group1 circle v_class1 #a6bce3 6 0.3333333
## 3 3 3 1 v_group1 circle v_class1 #a6bce3 5 0.2000000
## 4 4 4 1 v_group1 circle v_class1 #a6bce3 6 0.2000000
## 5 5 5 1 v_group1 circle v_class1 #a6bce3 10 0.2222222
## 6 6 6 1 v_group1 circle v_class1 #a6bce3 11 0.3272727
## Betweenness Eccentricity Closeness Hub_score
## 1 9.521895 3 0.01923077 0.6594111
## 2 5.529340 3 0.01818182 0.5028451
## 3 3.954798 3 0.01724138 0.3894429
## 4 5.061595 3 0.01818182 0.4479318
## 5 22.247067 2 0.02083333 0.7924572
## 6 21.167527 3 0.02083333 0.9409252
# use vertex.size to indicate degree
go <- c_net_set(go, vertex_size = "Degree")
plot(go, vertex_size_range = c(5, 20), legend = F, main = "Size map to degree")
5.2 Random network
You can use rand_net()
to generate a random graphs according to the Erdos-Renyi model with same node number and edge number of your network, then compare the random network with your network.
rand_net(co_net) -> random_net
Or use rand_net_par()
to generate lots of random networks and summary their topological indexes, then use compare_rand()
to do the comparison.
rand_net_par(co_net, reps = 30) -> randp
net_par(co_net) -> pars
compare_rand(pars, randp, index = c("Average_path_length", "Clustering_coefficent"))
5.3 Complex network
Most of the networks we construct in omics research belong to complex networks, which generally has the following characteristics, scale-free, small-world, modularity and hierarchy, as shown in the table below.
Terminology | Explanation |
---|---|
Scale-free | Scale-free It is a most notable characteristic in complex systems. It was used to desibe the finding that most nodes in a network have few neighbors while few nodes have large amount of neighbors. In most cases, the connectivity distribution asymptotically follows a power law. It can be expressed in , where $P(k) \sim k^{-y}$ ,P(k) is the number of nodes with k degrees, k is connectivity/degrees andΞ³is a constant. |
Small-world | Small-world It is a terminology in network analyses to depict the average distance between nodes in a network is short, usually logarithmically with the total number of nodes. It means the network nodes are always closely related with each other. |
Modularity | The modularity of a graph with respect to some division (or vertex types) measures how good the division is, or how separated are the different vertex types from each other. It defined as $Q=\frac{1}{2m} \sum_{i,j} (A_{ij}-\gamma\frac{k_i k_j}{2m})\delta(c_i,c_j)$ ,here mm is the number of edges, $A_{ij}$ is the element of the A adjacency matrix in row i and column j, $k_i$ is the degree of i, $k_j$ is the degree of j, $c_i$ is the type (or component) of i, $c_j$ that of j, the sum goes over all i and j pairs of vertices, and $\delta(x,y)$ is 1 if x=y and 0 otherwise. The resolution parameter $\gamma$ allows weighting the random null model, which might be useful when finding partitions with a high modularity.The original definition of modularity is retrieved when setting $\gamma$ to 1 @newmanModularityCommunityStructure2006. |
Hierarchy | Hierarchy It was used to depict the networks which could be arranged into a hierarchy of groups representing in a tree structure. Several studies demonstrated that metabolic networks are usually accompanied by a hierarchical modularity. |
fit_power()
is used to prove the scale-free property of a network, it will fit a power-law distribution to the degree distribution of the network.
data("c_net", package = "MetaNet")
fit_power(co_net)
smallworldness()
can calculate the smallworld index \(\sigma\).
\[ \sigma=\frac\gamma\lambda=\frac{C/C_{random}}{L/L_{random}} \]
C and L are the clustering coefficient and average path length of the empirical network, and \(C_{random}\) and \(L_{random}\) are the clustering coefficient and average path length of the random network, respectively.
If \(\sigma\) > 1, the empirical network usually has the characteristics of a small-world network.
smallworldness(co_net)
## 43.09368
5.4 Modules
A community is a subgraph containing nodes which are more densely linked to each other than to the rest of the graph or equivalently, a graph has a community structure if the number of links into any subgraph is higher than the number of links between those subgraphs.
Use module_net()
to generate a n-modules network and do some modules analysis test.
test_module_net <- module_net(module_number = 3, n_node_in_modu = 30)
plot(test_module_net, mark_module = T)
Module detection
There are many algorithms to detect modules in a network, some of the most common algorithms include:
short random walks
leading eigenvector of the community matrix
simulated annealing approach
greedy modularity optimization
β¦
You can get network modules by module_detect()
with various algorithms.
But we sometimes just focus on several modules instead of all, so we can use filter_n_module()
to get modules which have more than n nodes, or keep some other modules by ids.
par(mfrow = c(2, 2), mai = rep(1, 4))
# module detection
module_detect(co_net, method = "cluster_fast_greedy") -> co_net_modu
get_v(co_net_modu)[, c("name", "module")] %>% head()
## name module
## 1 s__un_f__Thermomonosporaceae 10
## 2 s__Pelomonas_puraquae 9
## 3 s__Rhizobacter_bergeniae 1
## 4 s__Flavobacterium_terrae 3
## 5 s__un_g__Rhizobacter 14
## 6 s__un_o__Burkholderiales 9
plot(co_net_modu,
plot_module = T, mark_module = T,
legend_position = c(-1.8, 1.6, 1.1, 1.3), edge_legend = F
)
table(V(co_net_modu)$module)
##
## 1 10 11 12 13 14 15 16 17 18 19 2 20 21 22 23 24 25 26 27 28 29 3 4 5 6
## 36 18 35 16 17 12 21 15 6 4 4 24 2 3 2 2 2 2 3 2 3 2 27 23 35 23
## 7 8 9
## 33 18 61
# keep some modules
co_net_modu2 <- filter_n_module(co_net_modu, n_node_in_module = 30, keep_id = 10)
plot(co_net_modu2, plot_module = T, mark_module = T, legend_position = c(-1.8, 1.3, 1.1, 1.3))
# change group layout
g_layout_circlepack(co_net_modu, group = "module") -> coors
plot(co_net_modu2, coors = coors, plot_module = T, mark_module = T, edge_legend = F)
# extract some modules, delete =T will delete other modules.
co_net_modu3 <- filter_n_module(co_net_modu, n_node_in_module = 30, keep_id = 10, delete = T)
plot(co_net_modu3, coors, plot_module = T)
Look at the components of the network, some too small sub_graphs will effect the modules, if you do not care about these small components, you can just filter out them.
table(V(co_net_modu)$components)
##
## 1 10 11 12 13 2 3 4 5 6 7 8 9
## 418 2 2 2 2 6 4 2 2 3 2 3 3
co_net_modu4 <- c_net_filter(co_net_modu, components == 1)
# re-do a module detection
co_net_modu4 <- module_detect(co_net_modu4)
g_layout_circlepack(co_net_modu4, group = "module") -> coors
plot(co_net_modu4, coors, plot_module = T)
Use plot_module_tree()
can display the dendrogram relationship of nodes, and combine_n_module()
can change the module numbers to a specific number (can not be too big or too small if there are some small small sub_graphs)
p1 <- plot_module_tree(co_net_modu4, label.size = 0.6)
# combine 17 modules to 5.
co_net_modu5 <- combine_n_module(co_net_modu4, 5)
p2 <- plot_module_tree(co_net_modu5, label.size = 0.6)
p1 + p2
We could also use this network module indicates some cluster which have similar expression/abundance. But we should filter the positive edges firstly as the module detection only consider topology structure rather than the edge type. After filtering the positive edges and module detection will find some modules like WGCNA gene modules, and we can also get the βeigengeneβ using module_eigen()
and have a general look at each module expression by module_expression()
.
data("otutab", package = "pcutils")
totu <- t(otutab)
# filter positive edges
c_net_filter(co_net, e_type == "positive", mode = "e") -> co_net_pos
co_net_pos_modu <- module_detect(co_net_pos, n_node_in_module = 15, delete = T)
g_layout_circlepack(co_net_pos_modu, group = "module") -> coors1
plot(co_net_pos_modu, coors1, plot_module = T)
# map the original abundance table
module_eigen(co_net_pos_modu, totu) -> co_net_pos_modu
# plot the expression pattern
p1 <- module_expression(co_net_pos_modu, totu,
r_threshold = 0.6,
facet_param = list(ncol = 4), plot_eigen = T
) +
theme(axis.text.x = element_text(size = 5, angle = 90, vjust = 0.5))
# correlate to metadata
env <- metadata[, 3:8]
p2 <- cor_plot(get_module_eigen(co_net_pos_modu), env) + coord_flip()
p1 / p2 + patchwork::plot_layout(heights = c(2, 1.4))
# summary some variable according to modules.
p3 <- summary_module(co_net_pos_modu, var = "Phylum") +
scale_fill_pc()
p4 <- summary_module(co_net_pos_modu, var = "node_eigen_cor") +
scale_color_pc(palette = "col2")
p3 + p4
Use links_stat()
to summary the edges and find most edges are from a module to the same module (means module detection is OK).
links_stat(co_net_modu2, group = "module", legend_number = T)
Keystone
After we determine these modules of network, the topological role of each node can be calculated according to Zi-Pi (1).
Within-module connectivity (Zi):
\[ Z_i= \frac{\kappa_i-\overline{\kappa_{si}}}{\sigma_{\kappa_{si}}} \]
Where \(ΞΊ_i\) is the number of links of node i to other nodes in its module si, \(\overline{\kappa_{si}}\) is the average of k over all the nodes in si, and \(\sigma_{\kappa_{si}}\) is the standard deviation of ΞΊ in si.
Among-module connectivity (Pi):
\[ P_i=1-\sum_{s=1}^{N_m}{\left( {\frac{\kappa_{is}}{k_i}} \right)^2} \]
where \(\kappa_{is}\) is the number of links of node i to nodes in module s, and \(k_i\) is the total degree of node i.
And researchers often divide module roles into four categories:
Zi | Zi>2.5 | |
---|---|---|
Pi | peripherals | module hubs |
Pi>0.62 | connectors | network hubs |
And some articles define these nodes in red categories as keystone nodes of a network (2).
Use zp_analyse()
to get module roles and store in the vertex attributes, then we can use zp_plot()
to visualize. We can see the module hubs are center of a module while connector are often mediate the connection of different modules.
zp_analyse(co_net_modu4) -> co_net_modu4
get_v(co_net_modu4)[, c(1, 16:21)] %>% head()
## name components module original_module Ki Zi
## 1 s__un_f__Thermomonosporaceae 1 6 6 3 0.4358899
## 2 s__Pelomonas_puraquae 1 11 11 15 1.9019177
## 3 s__Rhizobacter_bergeniae 1 1 1 4 1.0951304
## 4 s__Flavobacterium_terrae 1 3 3 4 1.8027756
## 5 s__un_g__Rhizobacter 1 14 14 1 -1.0488088
## 6 s__un_o__Burkholderiales 1 11 11 17 2.3326783
## Pi
## 1 0.3750000
## 2 0.3490305
## 3 0.5714286
## 4 0.0000000
## 5 0.0000000
## 6 0.1939058
# color map to roles
co_net_modu6 <- c_net_set(co_net_modu4, vertex_class = "roles")
plot(co_net_modu6, coors, mark_module = T, labels_num = 0, group_legend_title = "Roles")