Exercise 9: SQL database
- Use a new R or Rmarkdown script and call it ‘working-with-sql-databases.R’ (or .Rmd). Store all the code that you use now in this file.
- Start to code and set up a SQL database in your working directory called ‘tweets-sentiment-db.sqlite’.
- Download the file ‘tweets-sentiment140.csv’ from the ‘www’ course folder.
- Read about this dataset here: https://www.kaggle.com/kazanova/sentiment140
- Load the dataset from this file and add it to your database under the name “table_tweets”.
data <- readr::read_csv("tweets-sentiment140.csv") (adapt path in case you have another working directory!)
- Check whether the dataset has really become part of your database: List the tables in your database and list the fields of the corresponding table.
- Disconnect from your database and reconnect to the database again.
- Now run different queries.
- How many rows are there in the table “table_tweets”?
- How many rows are there if you group by the first column/the sentiment score (= target column)?
- Pull out a random sample of 100 rows and calculate the mean sentiment score. Compare that to the mean sentiment score of a random sample of 5000 rows.
- Aggregate/group by username, calculate the average sentiment per username and order the table as to see which are the 20 users with the worst average score across their tweets.