4.1 Understanding SNA Data

Blankly speaking, data collection in SNA research is concerned with two types of data: (1) relational data that describe ties, and (2) attribute data that describe nodes. For example, if I want to study friendship in a high school class, depending on my research questions I may choose to collect attribute data of each student (such as gender, race, GPA), and relational data of every possible pair of students (such as whether Student A texts Student B, or how many times A texts B). Quite simple, right?

However, real-world SNA projects in education demand a number of critical decisions to be made by the researcher. Just to list a few examples:

  • how to gain access to the research “field” (e.g., a student fraternity, an intimate parent group)
  • from whom are data collected (and who are excluded from data collection)
  • which instruments are used for data collection
  • how are data structured and stored
  • how to transform data to different “shapes” to address specific research questions

In this week’s reading – Carolan (2014), Ch 4 – you will read detailed suggestions from the author. Below I briefly comment on a few key points, before engaging you in detailed techniques in later sections.

4.1.1 Sources of SNA data

SNA data may be obtained in a variety of ways—from historical archives, questionnaires, ethnographic studies, system logs of online platforms, Medicaid claims, etc. For example, from public records researchers could analyze co-sponsorship of legislation in the U.S. Senate; from a classic Chinese novel, Dream of the Red Chamber, researchers could use SNA to estimate relationships between characters based on their co-occurrence (Zhao and Weko 2016). In these two cases, we can see that in SNA research some relational data are natural or readily available (e.g., co-sponsoring of a legislature), while some other need to be derived (e.g., relationship between novel characters based on the co-occurrence of names in a same sentence).

No matter how relational data get gathered or derived, I want to emphasize the importance of making sound justification on the collection/creation of relational data. For example, I reviewed a manuscript that analyzed “co-location networks” based on students’ simultaneous access to wifi hotspots on a university campus. As a critical reviewer, I would pay special attention to any strong claims made on “social” connections among students, because accessing a same wifi hotspot does not imply any social interaction. However, if the study looks at pairs of students simultaneously accessing 10 hotspots on the campus every day, it would become a totally different story as such intense co-location could be an indicator of (potential) social ties. Therefore, in SNA studies we need to constantly reflect on the contextual definition(s) of ties and the operationalization of the definitions in data collection.

4.1.2 Totality and sampling

In some cases, we are able to collect a whole social network. Imagine the year-long NASA simulation of a Mars mission in an isolated dome, researchers would have a better chance of studying the social network of all six scientists in its totality.4 In other cases, when it becomes impossible to study a whole network (e.g., terrorist networks), researchers will need to apply specific techniques of sampling.

As you will read this week, sampling in SNA research is different from sampling we commonly discuss in an introductory research methods course. This is because SNA research is concerned with both the nodes and the ties. Simply put, a representative sample of nodes does not naturally guarantee a meaningful sample of ties. SNA researchers need to be especially aware of the impact of sampling on relational data. For example, in a study we may systematically sample every 5th student from a school based on student IDs (sampling applied on nodes), and we may also ask each student to name up to 3 friends in this school (sampling applied on ties). Ego-network also implies an interesting sampling mechanism, as it cares about a focal ego and the nodes to whom the ego is directly connected to plus the ties. This could be intuitively understood by sampling based on distance from the ego.

How we sample nodes and/or ties depends on how we specify the boundaries of networks. The definition of network boundaries is highly critical for any SNA research and requires systematic considerations on research questions, theoretical perspectives, availability of data, etc. To quote Scott (2012):

… the determination of network boundaries is not simply a matter of identifying the apparently natural or obvious boundaries of the situation under investigation. Although ‘natural’ boundaries may, indeed, exist, the determination of boundaries in a research project is the outcome of a theoretically informed decision about what is significant in the situation under investigation… Researchers are involved in a process of conceptual elaboration and model building, not a simple process of collecting pre-formed data (pp. 44-45)

This important recognition speaks back to my Week 1 video on the importance of learning to make decisions in such a research methodology class. Defining the boundaries of networks is certainly the most central decision for an SNA project. When we start to inspect these decisions, it may looks like we’re opening “a can of worms”, as many decisions may look artificial, or slippery at best. In this class, I hope we see decision points as “a bag of diamonds”—each worth staring at from different angles.5


Zhao, Yunpeng, and Charles Weko. 2016. “Network Inference from Grouped Data,” 11~sep.

Scott, John. 2012. Social Network Analysis. SAGE. https://doi.org/10.5040/9781849668187.

  1. Note that the description of models introduced here may not fit the philosophical worldview you feel comfortable with or subscribe to. Refer back to Section @ref{threelevels} for an earlier discussion we had about aligning methodology and philosophical viewpoints.

  2. The sna package provides a function named ego.extract for the same purpose.