Big data and Social Science

11.5 Exercise 9: SQL database

Use a new R or Rmarkdown script and call it ‘working-with-sql-databases.R’ (or .Rmd). Store all the code that you use now in this file.
Start to code and set up a SQL database in your working directory called ‘tweets-sentiment-db.sqlite’.
Download the file ‘tweets-sentiment140.csv’ from the ‘www’ course folder.
Read about this dataset here: https://www.kaggle.com/kazanova/sentiment140
Load the dataset from this file and add it to your database under the name “table_tweets”.
- Tip: data <- readr::read_csv("tweets-sentiment140.csv") (adapt path in case you have another working directory!)
Check whether the dataset has really become part of your database: List the tables in your database and list the fields of the corresponding table.
Disconnect from your database and reconnect to the database again.
Now run different queries.
1. How many rows are there in the table “table_tweets”?
2. How many rows are there if you group by the first column/the sentiment score (= target column)?
3. Pull out a random sample of 100 rows and calculate the mean sentiment score. Compare that to the mean sentiment score of a random sample of 5000 rows.
4. Aggregate/group by username, calculate the average sentiment per username and order the table as to see which are the 20 users with the worst average score across their tweets.