2 Background

2.1 An ink blot

In the summer of 2009, mobile phones in Rwanda were buzzing. Apart from regular calls, around 1,000 Rwandans received a call from Joshua Blumenstock and his team. They were researching wealth and poverty by surveying a random sample from a database of 1.5 million customers of Rwanda’s leading mobile phone provider. The survey involved questions about the participants’ demographic, social, and economic characteristics.

However, what set this study apart was the next step. Along with the survey data, the researchers had complete call records for all 1.5 million people. They combined the survey data with the call records to train a machine learning model that predicted a person’s wealth based on their call records. Using this model, they estimated the wealth of all 1.5 million customers. Furthermore, they estimated the residence locations of these customers using the geographic data from the call records. This allowed them to create detailed maps showing the distribution of wealth in Rwanda, down to Rwanda’s smallest administrative units, the 2,148 cells.

While it was challenging to validate the accuracy of these estimates at such a granular level, when aggregated to Rwanda’s 30 districts, the results closely matched the estimates from the Demographic and Health Survey, a benchmark survey in developing nations. Notably, Blumenstock’s method was approximately 10 times faster and 50 times cheaper than traditional surveys.

This study’s implications vary based on one’s perspective:

  • Social scientists see a novel tool for testing economic development theories.
  • Data scientists view it as an intriguing machine learning challenge.
  • Business professionals recognize a method to derive value from existing big data.
  • Privacy advocates are reminded of the era of mass surveillance.
  • Policy makers envision how technology can foster a better world.

In essence, this study serves as a glimpse into the future of social research.

2.2 Digital Age

The digital age is transforming the landscape of social research. This era is characterized by the shift from analog to digital, leading to the creation of vast amounts of digital data. This transition hasn’t occurred overnight and is still ongoing, but its impact is undeniable. Everyday activities that were once analog, like using film cameras, reading physical newspapers, or paying with cash, have now transitioned to digital formats. This shift means more data about individuals is being captured and stored digitally.

The volume of digital information, often referred to as “big data,” is growing exponentially. Alongside this, there’s a surge in computing power. These trends are expected to persist, with information storage becoming predominantly digital.

For social research, a key feature of the digital age is the ubiquity of computers. What started as room-sized machines limited to governments and large corporations have evolved into personal computers, laptops, smartphones, and now the “Internet of Things” (IoT) - embedding computers in everyday objects like cars, watches, and thermostats. These devices don’t just compute; they sense, store, and transmit information.

The implications for researchers are profound. Online environments, for instance, allow for precise data collection and the ability to run randomized controlled experiments. If you’ve shopped online, your behavior has likely been tracked, and you’ve possibly been part of an experiment. This level of measurement and experimentation is expanding beyond the online realm. Physical stores are beginning to monitor customer behavior in detail and integrate experimentation into their operations. The IoT ensures that more real-world behaviors are captured digitally, making the entire world a potential research field.

The digital age also introduces innovative communication methods, enabling researchers to conduct unique surveys and collaborate on a massive scale. While some might argue that these capabilities aren’t entirely new, the sheer scale and integration of these technologies make the current era distinct. An analogy is made between capturing a single image of a horse (a photograph) versus capturing 24 images per second (a movie). While both are essentially sequences of images, their impact and utility are vastly different.

In essence, the digital age requires a fusion of traditional social research methods with modern data science techniques. For instance, the research by Joshua Blumenstock combined conventional survey research with data science. To harness the full potential of the digital age, researchers must integrate insights from both social science and data science.

2.3 Research Design

Research design serves as the bridge between questions and answers in social research. The digital age has ushered in new opportunities for research, and this book aims to cater to two distinct audiences:

Social Scientists: Those who have experience in studying social behavior but might be less acquainted with the digital age’s potential. Data Scientists: Individuals, often from fields like computer science, statistics, and engineering, who are adept with digital tools but might be newer to studying social behavior. The goal is to merge the expertise of both groups to produce richer and more insightful research. Instead of focusing solely on abstract social theories or advanced machine learning techniques, the emphasis is on research design. Research design is the core that links questions to answers, ensuring the production of compelling research.

The book highlights four primary approaches to social research:

  • Observing behavior
  • Asking questions
  • Running experiments
  • Collaborating with others

While these methods might be familiar, the digital age offers novel ways to collect and analyze data. This requires a modernization of these classic approaches, adapting them to the opportunities and challenges of the digital era.

2.4 Themes

This work revolves around two central themes:

Mixing Readymades and Custommades: This theme draws an analogy between the art styles of Marcel Duchamp, known for repurposing ordinary objects as art (readymades), and Michelangelo, who meticulously crafted his masterpieces like the statue of David (custommades). In the context of social research in the digital age, some researchers cleverly repurpose existing big data (readymades) while others generate specific data to answer their questions (custommades). The book emphasizes the value of blending these two approaches. For instance, Joshua Blumenstock’s research combined repurposed call records (readymade) with their own survey data (custommade). This fusion often leads to the most groundbreaking research.

Ethics: With the digital age granting researchers vast capabilities, ethical considerations become paramount. The digital age allows researchers to observe and experiment with millions of people, often without their knowledge or consent. This increased power comes with the responsibility to ensure that research does not harm participants. For instance, even “anonymized” call records can potentially be used to identify individuals and infer sensitive information about them. The book stresses the importance of responsibly balancing the risks and opportunities presented by digital-age research.