4.4 Managing SNA data

On this page, I introduce recommended ways of structuring and handling SNA data. Here I especially consider principles of tidy data (Wickham 2014), which may question things you encounter in textbooks. The principles of tidy data are very simple (p. 4), and I will explain each principle below with examples.

  1. Each variable forms a column.
  2. Each observation forms a row.
  3. Each type of observational unit forms a table.

4.4.1 Basic representations

Note that SNA data typically include attribute data about nodes and relational data about edges. So the most straightforward way to represent a network is to have two separate tables.

For example, consider a student group with four students (nodes). Table 1 contains attribute data of each student. This table is tidy because each variable form a column, each observation (i.e., student) form a row, and it contains a single observational unit student.

Table 1. A table of nodes.
name gender age
A F 15
B M 14
C M 13
D F 14

Table 2 describes book lending activities (ties) among students. For example, Row 1 means Student A lent one book to B, while Row 4 shows B lent 2 books to C. It is also a tidy dataset.

Table 2. A table of weighted ties
source target weight
A B 1
A C 0
A D 0
B C 2
B D 1
C D 0

Note that weight is not always required for networks. In a study that only cares about the existence of a tie, Column 3 will contain only 0 and 1. Or, rows with having a value of 0 in Column 3 will be simply removed from this table.

Table 3 could be the original record from which Table 2 is constructed. In Table 3, each row represents a book lending action, with its date recorded in Column 3. Here, you get a sense how researchers may need to transform data from its original observations (Table 3) to a specific format (Table 2), even though most SNA software can handle both formats.

Table 3. A table of raw data of ties.
source target date
A B 2017-02-03
A C 2017-02-04
A D 2017-02-05
B C 2017-02-06
B D 2017-02-07
C D 2017-02-08
B C 2017-02-09

Additionally, in situations you do not care about node attributes (in Table 1), you can simply only use relational data – only about edges – to construct a network. In this case, you will only have a table of relational data, which already contain the most basic information (identifiers) of nodes. Take Table 2 for example, all unique node identifers in columns source and target will be extracted to create a list of nodes, with no further information about their attributes.

4.4.2 Two-mode data

Imagine the research project is actually more complicated: We are also interested in the relationship between book-lending behaviors and student affiliations with sports teams. In this case, you may have two additional tables below.

Table 4. Sports teams in the school.
sports_teams pratice_day
baseball Tue
basketball Mon
volleyball Fri
Table 5. Student affiliation with sports teams.
student team
A basketball
A volleyball
B baseball
C basketball
D baseball
D volleyball

Like what I just mentioned, you could ignore Table 4 if Table 5 already contains all information about sports teams. But if there is a football team not covered by Table 5, you will need to include Table 6 as well.

Table 6. Sports teams in the school (version 2).
sports_teams pratice_day
baseball Tue
basketball Mon
volleyball Fri
football Wed

Using Table 5, you could construct a two-mode network – also called as an affiliation network – with students and sports teams as two types of actors in the network. In contrast, Table 2 only has one mode – students.

Finally, if your research project is concerned with friendship in general – which covers both book lending and sports affiliation – you could even merge two types of relational data together (with solid justification). For example, from Table 5 we can tell A and C are both in the basketball team. We can then adjust the weight between A and B in Table 2 accordingly. This is another type of transformation you may need to do in SNA research. Knowing basic data transformation techniques – either in spreadsheet software or in R – would be helpful for work in this class.

To summarize, this page provides a basic overview of how SNA data could be structured. You may encounter different ways of representing SNA data, such as a relationship matrix with rows and columns representing the same set of actors (see the Harry Potter support networks for example). Such representations could all be derived from a tidy dataset discussed above. In data collection, we will strive for keeping as much raw information as possible (such as timestamp), to enable analyses that only come to your mind afterwards.

References

Wickham, Hadley. 2014. “Tidy Data.” Journal of Statistical Software 59 (1):1–23. https://doi.org/10.18637/jss.v059.i10.