library(network)
<- read.csv("network_data/IS_BBC_61_IS_BBC-FRIENDS.csv", row.names=1)
adj_df <- as.matrix(adj_df)
adj_mat <- network(adj_mat, directed=FALSE) statnet_net
Statnet vs igraph
There are two main packages used in social network analysis in R: Statnet and igraph. Statnet is actually a suite of package with a focus on network modeling using Exponential Random Graph Models. In contrast, igraph is a single package with a broad range of network analysis tools but little focus on modeling graphs. In addition, igraph is also available in Python, Mathematica and C.
There are many other packages out there that deal with specific network problems but in my experience most of them integrate into the statnet or igraph framework (or both).
The purpose of this page is not to demonstrate what all you can do with igraph and statnet or to show which one of these are better. Instead I want to show you basic operations in both so it is easier to translate code from package to another if necessary.
Loading a Network
Both packages have function(s) to convert dataframes or matrices into their own network object. The biggest difference here is that igraph has an array of functions to deal with a variety of common network types, while statnet uses only one function (network()
).
In the below examples we use two datasets. The adjacency matrix is the Islamic State Group network available via UCINET. The edgelist is campaign donors in the 2016 Nevada State Senate election that is part of my own data on campaign donors available on the Harvard Dataverse.
If you want to follow along the Nevada egdelist data is here and the metadata is here.
The network()
function can be used to convert both an adjacency matrix or an edgelist. You can set the type of data using matrix.type
argument but it will also guess. In general if you provide a matrix it will assume it is an adjacency matrix. If you provide a data.frame it will assume it is an edgelist.
In order to convert our dataframe into a matrix we need to use as.matrix()
but this can create a problem if some of our columns are not numbers. OFten the first column in your adjacency csv is the vertex names. To deal with this when we read in the data using read.csv()
we are going to set row.names=1
which makes the first column in your csv into rownames instead of treating it as a regular column.
Reading in an edgelist is comparatively easy. Assuming that you have a csv with your first column as where the edge starts and your second column as where the edge ends then just put that dataframe directly into the network()
call.
library(network)
<- read.csv("network_data/NV-2013-2016-Edges.csv")
edge_df <- network(edge_df, directed=FALSE) statnet_net
There are some additional arguments to worry about (there are more as well but this should cover most cases):
directed
is whether the network is directed (default isTRUE
).loops
is whether edges can point towards themselves(default isFALSE
).multiple
is whether there are multiple types of edges (default isFALSE
).
igraph has a set of functions that can be used to convert different data types into a network. For adjacency matrix we use graph_from_adjacency_matrix()
and graph_from_data_frame()
but there is also read_graph()
which can be used to read data directly from a file, including a wide range of formats (pajek, graphml, gml, etc).
In order to convert our dataframe into a matrix we need to use as.matrix()
but this can create a problem if some of our columns are not numbers. OFten the first column in your adjacency csv is the vertex names. To deal with this when we read in the data using read.csv()
we are going to set row.names=1
which makes the first column in your csv into rownames instead of treating it as a regular column.
library(igraph)
<- read.csv("network_data/IS_BBC_61_IS_BBC-FRIENDS.csv", row.names=1)
adj_df <- as.matrix(adj_df)
adj_mat <- graph_from_adjacency_matrix(adj_mat, mode="undirected") igraph_net
Reading in an edgelist is comparatively easy. Assuming that you have a csv with your first column as where the edge starts and your second column as where the edge ends then just put that dataframe directly into the graph_from_data_frame()
call.
library(igraph)
<- read.csv("network_data/NV-2013-2016-Edges.csv")
edge_df <- graph_from_data_frame(edge_df, directed=FALSE) igraph_net
The additional arguments are a bit different across the two functions.
- In
graph_from_adjacency_matrix()
you usemode
to indicate the type of matrix and there are a variety of options to deal with non-symmetric matrices that you want to treat as undirected (the default isdirected
). - In
graph_from_data_frame()
the only additional argument is whether it is directed or not (the default isTRUE
).
Loading Data with Vertex Attributes
Both igraph and statnet have ways to add vertex attributes on to your network during the creation of the network. In this example I am only using the edgelist data. If you are using adjacency data the same steps can be used for the statnet package but for igraph I think you have to first get it into an edgelist format.
Your metadata
or vertex attributes will be in an additional dataframe with the first column as the vertex identifier. This needs to be the same as the vertex identifier you have in your edgelist or adjacency matrix.
library(network)
<- read.csv("network_data/NV-2013-2016-Edges.csv")
edge_df <- read.csv("network_data/NV-2013-2016-Meta.csv")
meta_df
<- network(edge_df, vertices=meta_df,
statnet_net directed=FALSE)
Your metadata
or vertex attributes will be in an additional dataframe with the first column as the vertex identifier. This needs to be the same as the vertex identifier you have in your edgelist or adjacency matrix.
library(igraph)
<- read.csv("network_data/NV-2013-2016-Edges.csv")
edge_df <- read.csv("network_data/NV-2013-2016-Meta.csv")
meta_df
<- graph_from_data_frame(edge_df, vertices=meta_df,
igraph_net directed=FALSE)
Accessing and Modifying Attributes
Both igraph and statnet allow you to modify vertex and edge attributes (and graph attributes). In addition the functions here are very similar but there are also some shortcuts in each package as well that are not similar.
The biggest difference between the two is that statnet modifies the network object in place while igraph does not.
To access a vertex attribute you use get.vertex.attribute()
with the network object and name of the attribute you want to access.
You can set a vertex attribute using set.vertex.attribute()
along with the network object, the name of the attribute you want to access, the value(s) and which vertex (or vertices) you want to modify. When you call this it modifies the network in-place and nothing is returned
Each vertex has a numeric ID from 1 to the number of vertices n the network.
library(network)
<- get.vertex.attribute(statnet_net, "Total")
total_donations ## Access the first 10 donation amounts
1:10] total_donations[
[1] 87000.0 112000.0 80500.0 81000.0 142431.1 109000.0 133500.0 43000.0
[9] 182500.0 30500.0
## Change the first vertex to have a "Total" of 100
set.vertex.attribute(statnet_net, "Total", value=100,
v=1)
<- get.vertex.attribute(statnet_net, "Total")
modified_donations 1:10] modified_donations[
[1] 100.0 112000.0 80500.0 81000.0 142431.1 109000.0 133500.0 43000.0
[9] 182500.0 30500.0
## Convert the total amount for all vertices to be in $1,000s
set.vertex.attribute(statnet_net, "Total", total_donations/1000)
<- get.vertex.attribute(statnet_net, "Total")
thousand_donations ## Access the first 10 donation amounts
1:10] thousand_donations[
[1] 87.0000 112.0000 80.5000 81.0000 142.4311 109.0000 133.5000 43.0000
[9] 182.5000 30.5000
You can also access and modify vertex attributes using %v%
as an operator instead. This simplifies the above code, though it isn’t possible to modify an individual vertex.
## Access the attribute PerRep
<- statnet_net %v% "PerRep"
percent_rep 1:10] percent_rep[
[1] 37.35632 54.01786 47.82609 61.72840 44.53457 35.32110 31.08614
[8] 0.00000 46.57534 100.00000
## Convert this to a proportion and store it
%v% "Proportion_Rep" <- percent_rep/100 statnet_net
Access and modifying edge attributes follows in a similar way but with get.edge.attibute()
, get.edge.attribute()
and %e%
.
To access a vertex attribute you use vertex_attr()
with the network object and name of the attribute you want to access.
You can set a vertex attribute using set_vertex_attribute()
along with the network object, the name of the attribute you want to access, the value(s) and which vertex (or vertices) you want to modify. When you call this the new network object is returned.
Each vertex has a numeric ID from 1 to the number of vertices n the network.
library(igraph)
<- vertex_attr(igraph_net, "Total")
total_donations ## Access the first 10 donation amounts
1:10] total_donations[
[1] 87000.0 112000.0 80500.0 81000.0 142431.1 109000.0 133500.0 43000.0
[9] 182500.0 30500.0
## Change the first vertex to have a "Total" of 100
<- set_vertex_attr(igraph_net, "Total", value=100,
igraph_net index=1)
<- vertex_attr(igraph_net, "Total")
modified_donations 1:10] modified_donations[
[1] 100.0 112000.0 80500.0 81000.0 142431.1 109000.0 133500.0 43000.0
[9] 182500.0 30500.0
## Convert the total amount for all vertices to be in $1,000s
<- set_vertex_attr(igraph_net, "Total", value=total_donations/1000)
igraph_net
<- vertex_attr(igraph_net, "Total")
thousand_donations ## Access the first 10 donation amounts
1:10] thousand_donations[
[1] 87.0000 112.0000 80.5000 81.0000 142.4311 109.0000 133.5000 43.0000
[9] 182.5000 30.5000
In addition to this there are two other ways to modify vertex attributes. The first is to access them using V()$
the second is to directly assign values using vertex_attr() <-
.
## Access the attribute PerRep
<- V(igraph_net)$PerRep
percent_rep 1:10] percent_rep[
[1] 37.35632 54.01786 47.82609 61.72840 44.53457 35.32110 31.08614
[8] 0.00000 46.57534 100.00000
## Convert this to a proportion and store it
V(igraph_net)$Proportion_Rep <- percent_rep/100
## The above and below are equivalent
vertex_attr(igraph_net, "Proportion_Rep") <- percent_rep/100
One benefit of this feature is that it is able to access and modify attributes based on other attributes. For example, imagine I want to create a new variable that indicates whether groups gave only to Republicans or only to Democrats, or both:
V(igraph_net)$Type <- "Both"
vertex_attr(igraph_net, "Type", V(igraph_net)[PerDem == 100]) <- "Democrat Only"
vertex_attr(igraph_net, "Type", V(igraph_net)[PerRep == 100]) <- "Republican Only"
## Create a table to see he counts
table(V(igraph_net)$Type)
Both Democrat Only Republican Only
304 168 127
Access and modifying edge attributes follows in a similar way but with get.edge.attibute()
, get.edge.attribute()
and %e%
.
Extracting Data from Networks
One common occurrence in network analysis is extracting nodal/vertex data out from your network. For example, you might want to use regression analysis to look at whether centrality of a node is related to other nodal characteristics. To do this you’ll load the data into a network, calculate the centrality scores, and then want to append this to your data. The easiest way (in my opinion) is to add these centrality statistics as a vertex attribute then convert the network into a vertex based dataframe.
For statnet this is accomplished using as.data.frame()
and indicating whether you want to convert to a vertex or edge dataframe using the unit=
argument. In the below example I calculate the degree and betweenness centrality of each node, then convert it to a dataframe and show the first 5 rows.
library(network)
library(sna)
%v% "degree" <- degree(statnet_net, gmode="graph")
statnet_net %v% "between" <- betweenness(statnet_net, gmode="graph")
statnet_net
<- as.data.frame(statnet_net, unit="vertices")
df
1:5,] df[
vertex.names ContributorName CatCodeIndustry
1 1887 NEWMONT MINING Mining
2 2906 WYNN RESORTS Gambling & Casinos
3 541 CENTURYLINK Telecom Services & Equipment
4 958 FARMERS INSURANCE GROUP Insurance
5 396 BOYD GAMING Gambling & Casinos
CatCodeGroup CatCodeBusiness
1 Energy & Natural Resources Metal mining & processing
2 General Business Casinos, racetracks & gambling
3 Communications & Electronics Telecommunications
4 Finance, Insurance & Real Estate Insurance agencies, brokers & agents
5 General Business Casinos, racetracks & gambling
PerDem PerRep DemCol RepCol Total Proportion_Rep degree
1 62.64368 37.35632 #00003DAF #00003DAF 87.0000 0.3735632 82
2 45.98214 54.01786 #1A0000AF #1A0000AF 112.0000 0.5401786 225
3 52.17391 47.82609 #00000AAF #00000AAF 80.5000 0.4782609 163
4 38.27160 61.72840 #3D0000AF #3D0000AF 81.0000 0.6172840 169
5 55.46543 44.53457 #00001AAF #00001AAF 142.4311 0.4453457 86
between
1 198.2822
2 3518.4604
3 393.7207
4 472.0289
5 536.6969
For graiph this is accomplished using as_data_frame()
and indicating whether you want to convert to a vertex or edge dataframe using the what=
argument. In the below example I calculate the degree and betweenness centrality of each node, then convert it to a dataframe and show the first 5 rows.
library(igraph)
V(igraph_net)$degree <- degree(igraph_net)
V(igraph_net)$between <- betweenness(igraph_net)
<- as_data_frame(igraph_net, what="vertices")
df
1:5,] df[
name ContributorName CatCodeIndustry
1887 1887 NEWMONT MINING Mining
2906 2906 WYNN RESORTS Gambling & Casinos
541 541 CENTURYLINK Telecom Services & Equipment
958 958 FARMERS INSURANCE GROUP Insurance
396 396 BOYD GAMING Gambling & Casinos
CatCodeGroup CatCodeBusiness
1887 Energy & Natural Resources Metal mining & processing
2906 General Business Casinos, racetracks & gambling
541 Communications & Electronics Telecommunications
958 Finance, Insurance & Real Estate Insurance agencies, brokers & agents
396 General Business Casinos, racetracks & gambling
PerDem PerRep DemCol RepCol Total Proportion_Rep Type degree
1887 62.64368 37.35632 #00003DAF #00003DAF 87.0000 0.3735632 Both 82
2906 45.98214 54.01786 #1A0000AF #1A0000AF 112.0000 0.5401786 Both 225
541 52.17391 47.82609 #00000AAF #00000AAF 80.5000 0.4782609 Both 163
958 38.27160 61.72840 #3D0000AF #3D0000AF 81.0000 0.6172840 Both 169
396 55.46543 44.53457 #00001AAF #00001AAF 142.4311 0.4453457 Both 86
between
1887 198.2822
2906 3518.4604
541 393.7207
958 472.0289
396 536.6969
Using Both Packages
You need to be careful when you load both igraph and the statnet suite of packages. They have several functions with the exact same names. The example below shows what happens when you do this:
library(igraph)
library(sna)
degree(statnet_net)[1:10] #will work
[1] 164 450 326 338 172 312 142 258 108 224
degree(igraph_net)[1:10] #won't work
Error in FUN(X[[i]], ...): as.edgelist.sna input must be an adjacency matrix/array, edgelist matrix, network, or sparse matrix, or list thereof.
A solution to this is to prepend function calls with the library the come from:
::degree(statnet_net)[1:10] #will work sna
[1] 164 450 326 338 172 312 142 258 108 224
::degree(igraph_net)[1:10] #will work igraph
1887 2906 541 958 396 9686991 3216 9688247
82 225 163 169 86 156 71 129
8044 24820735
54 112
You can also unload packages using detach("package:igraph", unload=TRUE)
or detach("package:sna", unload=TRUE)