Analyzing the Lyrics
Now, we must consider the situation. Just because we have the text as individual words, this doesn’t mean we should assign equal importance to every word. There are many words stop words as we will refer to them, that do not need to be included. These include words like “to, the, of, …” you get the idea. There are preset stop words in the tidytext
world, and we eliminate them by loading the stop_words
and eliminating them with the anti_join()
function as shown below. Further, we count and sort the lyrical content with count(word, sort = TRUE)
.
library(knitr)
data(stop_words)
tidy_ftp <- tidy_ftp %>%
anti_join(stop_words) %>%
count(word, sort = TRUE)
## Joining, by = "word"
kable(head(tidy_ftp))
word | n |
---|---|
fight | 18 |
power | 14 |
hear | 13 |
lemme | 13 |
powers | 5 |
people | 3 |
Visualizing the Lyrics
One question that we may ask is what are the top words used. From the previous table, we get a sense for the frequency of different words used. If we choose words that occur more than three times, this seems to be a good measure for top words here. We name the variable bars
as the result of a pipe where we filter
the words based on \(n > 3\) and reorder the column so that it is in decreasing order with the mutate
function. We then display the table as a kable
.
\(~\)
bars <- tidy_ftp %>%
filter(n > 3) %>%
mutate(word = reorder(word, n))
kable(head(bars))
word | n |
---|---|
fight | 18 |
power | 14 |
hear | 13 |
lemme | 13 |
powers | 5 |
\(~\)
Now we can plot bars
using ggplot
.
\(~\)
ggplot(bars, aes(word, n)) +
geom_col(stat = "identity", fill = "firebrick1",color = "firebrick1", alpha = 0.4) +
labs(title = "Top Words in Fight the Power", subtitle = "Public Enemy from Fear of A Black Planet", x = "Word", y = "Count") +
coord_flip() +
theme_minimal()
## Warning: Ignoring unknown parameters: stat