8.5 Lab: Classifying tweets & accounts

In this lab we’ll use Twitter data to illustrate the use of different Google APIs. * Use a tweet with an image as an example (Alice Weidel) + Translate + Sentiment + Because we can classify different things.. the text, the image etc.

8.5.1 Software

  • googleLanguageR: Interact with the Google Natural Language API
    • See the vignette for an overview
    • Google Natural Language API
      • Entity analysis (i.e., finds named entities, types, salience, mentions + properties, metadata)
      • Syntax (i.e., syntax analysis, e.g., identify nouns)
      • Sentiment (i.e., provides sentiment scores)
      • Content Classification (i.e., content classification into categories)
    • Google Cloud Speech-to-Text API
    • Google Cloud Text-to-Speech API
    • Google Cloud Translation API
  • googleCloudVisionR: Interact with the Google Vision API

8.5.3 Twitter: Authenticate & load data

In order get access to the Twitter API you need to create an app and generate the corresponding API keys on the Twitter developer platform. See slides and lab on Twitter starting with X-Twitter’s APIs (we didn’t go through them). Here we’ll merely download a few tweets to explore the Google ML APIs.

We’ll work with tweets by Alice Weidel (Afd) and tweets from Martin Schulz (SPD). The tweets themselves are text data that we can analyze using the Google Natural Language APIs.

If you can’t authenticate with Twitter download the data data_tweets.RData from the material folder and load it into R with the command below:

8.5.4 Google: Authenticate

The following Markdown file can be used if you followed all the steps described in the Google Docs document.

Fill in the the quotation marks with the directory where the created JSON-File is located & read in the JSON-File (gl_auth).

8.5.5 Translation API

We can use the Cloud Translation API to translate the tweets from Gean to English (other languages can of course be choosen as well. Check the language codes under the following link and replace the string “de” in the target command: https://developers.google.com/admin-sdk/directory/v1/languages.

8.5.7 NLP API: Syntax

The Google NLP API also allows for analyzing syntax. This is extremely helpful as sometimes we may want to isolate certain parts of a sentence. Below we extract the nouns and subsequently plot them in a using a wordcloud. The function returns various information.

8.5.8 Analyzing images

In addition, tweets may contain images that we can analyze using the Google Vision API. Each tweet comes with an image (and the corresponding link) or not.

How many out of 1000 tweets (of the two politicians) contain images?

Who uses more images?

Below we load those images and store them in a directory.

Next we try to recognize text entities in the text (we do so directly for the image urls - not downloading the data):

Then we merge the list of words (for every image into sentences).

And we add it to the original tweet dataset:

We also try to recognize objects in those images:

Then we turn the list of objects into a string variable:

Then we join the scraped data with the original tweet data:

Acharya, Avidit, Matthew Blackwell, and Maya Sen. 2016. “Explaining Causal Findings Without Bias: Detecting and Assessing Direct Effects.” Am. Polit. Sci. Rev. 110 (3): 512–29.

Alvarez, Michael R. 2016. Computational Social Science. Cambridge University Press.

Angwin, Julia, Jeff Larson, Lauren Kirchner, and Surya Mattu. 2016. “Machine Bias.” https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing.

Athey, Susan, and Guido W Imbens. 2019. “Machine Learning Methods That Economists Should Know About.” Annu. Rev. Econom. 11 (1): 685–725.

Bauer, Paul. 2018. “Writing a Reproducible Paper in R Markdown,” May.

Bauer, Paul C. 2015. “Negative Experiences and Trust: A Causal Analysis of the Effects of Victimization on Generalized Trust.” Eur. Sociol. Rev. 31 (4): 397–417.

Bauer, Paul C, and Clemm von Hohenberg. 2020. “Believing and Sharing Information by Fake Sources: An Experiment.” Political Communication, November.

Breiman, Leo. 2001. “Statistical Modeling: The Two Cultures (with Comments and a Rejoinder by the Author).” SSO Schweiz. Monatsschr. Zahnheilkd. 16 (3): 199–231.

Chollet, Francois, and J J Allaire. 2018. Deep Learning with R. 1st ed. Manning Publications.

Cioffi-Revilla, Claudio. 2017. “Computation and Social Science.” In Introduction to Computational Social Science: Principles and Applications, edited by Claudio Cioffi-Revilla, 35–102. Cham: Springer International Publishing.

Donoho, David. 2017. “50 Years of Data Science.” J. Comput. Graph. Stat. 26 (4): 745–66.

Dressel, Julia, and Hany Farid. 2018. “The Accuracy, Fairness, and Limits of Predicting Recidivism.” Sci Adv 4 (1): eaao5580.

Entwisle, B, and P Elias. 2013. “New Data for Understanding the Human Condition: International Perspectives.” Paris, France: OECD, available at http://www. oecd. org/sti/sci-tech/new-data-for-understanding-the-hu man-condition. pdf[ 1477].

Gerring, John. 2012. “Mere Description.” British Journal of Political Science 4 (4): 721–46.

Gill, Jeff. 1999. “The Insignificance of Null Hypothesis Significance Testing.” Polit. Res. Q. 52 (3): 647–74.

Golder, Scott A, and Michael W Macy. 2014. “Digital Footprints: Opportunities and Challenges for Online Social Research.” Annu. Rev. Sociol. 40 (1): 129–52.

Grimmer, Justin. 2015. “We Are All Social Scientists Now: How Big Data, Machine Learning, and Causal Inference Work Together.” PS Polit. Sci. Polit. 48 (1): 80–83.

Hilbert, Martin, and Priscila López. 2011. “The World’s Technological Capacity to Store, Communicate, and Compute Information.” Science 332 (6025): 60–65.

James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2013. An Introduction to Statistical Learning: With Applications in R. Springer Texts in Statistics. Springer.

King, Gary. 1995. “Replication, Replication.” PS, Political Science & Politics 28 (3): 444–52.

———. 2010. “A Hard Unsolved Problem? Post-Treatment Bias in Big Social Science Questions.” In Hard Problems in Social Science” Symposium, Harvard University. scholar.harvard.edu.

Laney, Doug. 2001. “3D Data Management: Controlling Data Volume, Velocity and Variety.” META Group Research Note 6 (70): 1.

Lazer, David, Alex Pentland, Lada Adamic, Sinan Aral, Albert-Laszlo Barabasi, Devon Brewer, Nicholas Christakis, et al. 2009. “Social Science. Computational Social Science.” Science 323 (5915): 721–23.

Lee, Claire S, Jeremy Du, and Michael Guerzhoy. 2020. “Auditing the COMPAS Recidivism Risk Assessment Tool: Predictive Modelling and Algorithmic Fairness in CS1.” In Proceedings of the 2020 ACM Conference on Innovation and Technology in Computer Science Education, 535–36. ITiCSE ’20. New York, NY, USA: Association for Computing Machinery.

Mayer-Schönberger, Viktor, and Kenneth Cukier. 2012. Big Data: A Revolution That Transforms How We Work, Live, and Think. Boston: Houghton Mifflin Harcourt.

Mellon, Jonathan. 2013. “Where and When Can We Use Google Trends to Measure Issue Salience?” PS Polit. Sci. Polit. 46 (2): 280–90.

Molina, Mario, and Filiz Garip. 2019. “Machine Learning for Sociology.” Annu. Rev. Sociol., July.

Monroe, Burt L. 2013. “The Five Vs of Big Data Political Science Introduction to the Virtual Issue on Big Data in Political Science Political Analysis.” Polit. Anal. 21 (V5): 1–9.

Munzert, Simon, Christian Rubba, Peter Meißner, and Dominic Nyhuis. 2014. Automated Data Collection with R: A Practical Guide to Web Scraping and Text Mining. John Wiley & Sons.

Open Science Collaboration. 2015. “Estimating the Reproducibility of Psychological Science.” Science 349 (6251): aac4716.

Richthammer, Christian, Michael Netter, Moritz Riesner, Johannes Sänger, and Günther Pernul. 2014. “Taxonomy of Social Network Data Types.” EURASIP Journal on Information Security 2014 (1): 11.

Salganik, Matthew J. 2017. Bit by Bit: Social Research in the Digital Age. Princeton University Press.

Wikipedia contributors. 2018. “Data.” https://en.wikipedia.org/w/index.php?title=Data&oldid=869556199.

Zimmer, Michael. 2010. “‘But the Data Is Already Public’: On the Ethics of Research in Facebook.” Ethics Inf. Technol. 12 (4): 313–25.