6.23 Lab 13: Sentiment analysis

First we’ll need some data that contains text of any sort. We’ll work with the large set of tweets that you know from before. You can find a description here.

## [1] "table_tweets"
## [1] "target" "ids"    "date"   "flag"   "user"   "text"

Then we’ll need to load a couple of packages necessary to do the corresponding analyses.

And we convert the text into the tidy text format as described in the book Text Mining with R by Julia Silge and David Robinson.



Below we produce a table the contains the most frequent words in the tweets we analyze (with stopwords).

Q: What are stopwords?

Table 6.1: Most frequent words in responses without stopwords
word n
day 532
quot 472
http 451
love 411
lol 367
im 364
time 363
amp 271
night 265
2 259
home 251
haha 220
morning 220
miss 215
tomorrow 214
twitter 214
hope 205
3 204
sad 182
feel 181



And naturally we can also visualize that data in a barplot containing the most frequent words.

Most frequent words in open-ended answers w/o stopwords

Figure 6.1: Most frequent words in open-ended answers w/o stopwords



Below we produce a table the contains the most frequent words in the tweets we analyze without stopwords.



A word cloud is another way to visualized the most frequent words in documents, e.g. tweets.

Wordcloud (w/o stopwords)

Figure 6.2: Wordcloud (w/o stopwords)



Finally, let’s try to assess the sentiment of those tweets. Wikipedia describes sentiment analysis as follows:

"Opinion mining (sometimes known as sentiment analysis or emotion AI) refers to the use of natural language processing, text analysis, computational linguistics, and biometrics to systematically identify, extract, quantify, and study affective states and subjective information. Sentiment analysis is widely applied to voice of the customer materials such as reviews and survey responses, online and social media, and healthcare materials for applications that range from marketing to customer service to clinical medicine.

Generally speaking, sentiment analysis aims to determine the attitude of a speaker, writer, or other subject with respect to some topic or the overall contextual polarity or emotional reaction to a document, interaction, or event. The attitude may be a judgment or evaluation (see appraisal theory), affective state (that is to say, the emotional state of the author or speaker), or the intended emotional communication (that is to say, the emotional effect intended by the author or interlocutor)."

Dictionary-based methods like the one we use below find the total sentiment of a document (e.g. a tweet) by adding up the individual sentiment scores for each word in the text.

Below we produce a table of the frequency of words reflecting joy that appear in the tweets. Beforehand we have to define which words reflect joy. There are different dictionaries that provide us with respective words.

Table 6.2: Frequency of happy words
word n
good 532
love 411
hope 205
fun 164
happy 164
feeling 100
finally 71
pretty 69
excited 68
birthday 65



Now we’ll add sentiment scores to the original table that contains our tweets.

Table 6.3: Sentiment of individual statements/lines
line index negative positive sentiment
2 0 0 1 1
3 0 1 0 -1
4 0 0 2 2
6 0 0 1 1
7 0 1 0 -1
9 0 0 1 1
11 0 1 1 0
12 0 0 2 2
14 0 0 1 1
15 0 0 1 1



Then we can produce a table with the most frequent positive and negative words.

Table 6.4: Most frequent positive or negative words.
word sentiment n
good positive 532
like positive 454
love positive 411
work positive 374
well positive 300
miss negative 215
great positive 211
sad negative 182
bad negative 170
fun positive 164



…and a barplot of the same…
Barplot: Most common negative and positive words

Figure 6.3: Barplot: Most common negative and positive words



Finally, a wordcloud with the postive and negative words.

Wordcloud: Most common negative and positive words

Figure 6.4: Wordcloud: Most common negative and positive words



And as a final plot a histogram of the sentiment scores across all tweets in our dataset.

−4−2024605001000150020002500
Distribution of Sentiment Scores (Histogram)N

Figure 6.5: Distribution of the sentiment scores: Negative/positive for negative and positive sentiment



Finally, let’s aggregate the data by user to get a ranking as to be able to compare users with regard to their overall sentiment.

  • Questions
    • To what kind of data could we apply sentiment analysis?
      • Actors? Groups? Text?
    • What phenomena could we study?

–>

Chollet, Francois, and J J Allaire. 2018. Deep Learning with R. 1st ed. Manning Publications.

James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2013. An Introduction to Statistical Learning: With Applications in R. Springer Texts in Statistics. Springer.

Molina, Mario, and Filiz Garip. 2019. “Machine Learning for Sociology.” Annu. Rev. Sociol., July.