Chapter 4 Reading Datasets
Hello!
This tutorial is for those who want to learn how to read .csv files into R, which is the most common filetype you’ll probably use.
csv
stands for “comma separated values” file. It’s an uber-simple spreadsheet (Microsoft Word is to notepad as Excel is to csv). A lot of data you will use comes in as a csv or will be a file you want to transform into a csv.
To read a csv file into R, you will need (1) a csv file and (2) some knowledge about where your csv file and code are (they should be in the same folder). In this tutorial, we’ll work with some sample (“toy”) data: a sample of tweets talking about “Defund the Police” in June 2020.
First, let’s figure out what folder we are in! As we discussed in an earlier tutorial, we can use the getwd()
function to check what directory (computer folder) you are in and setwd()
to change your directory.
## [1] "C:/Users/comm-lukito/Documents/J381M/2023_J381M_bookdown"
You want to make sure that the file you are trying to bring into R is in the same folder as the .rmd file. You can either move the csv file to the appropriate folder, or change your working directory to the folder where the .csv
file is in. In this example, the file name (crowdtangle_barbenheimer_2023.csv
) is in a folder called “data”.
Now that you’ve made sure your .csv
file is in the working directory, we can now use the read.csv()
function! We want to read the csv AND assign it to an object so that it is saved in our environment. We’ll call the object “social_media_file”
You should now see an object social_media_file
in your global environment now. The csv has 500 observations and 89 variables.
You can use head()
and tail()
to look at the top n
rows and bottom n
rows.
#head(social_media_file) #I "commented" these two lines so they would not run. To run them yourself, just remove the # at the front of the line
#tail(social_media_file)
tail(social_media_file, n = 3) #in this row, I am only looking at the first 3 rows
## ï..Page.Name User.Name Facebook.Id
## 16792 WKNO - 91.1 FM Memphis WKNOFM 1.000463e+14
## 16793 KUNC 91.5fm KUNC915 1.000641e+14
## 16794 Tagg Magazine taggmagazine 1.000635e+14
## Page.Category Page.Admin.Top.Country
## 16792 BROADCASTING_MEDIA_PRODUCTION US
## 16793 MEDIA_NEWS_COMPANY US
## 16794 TOPIC_PUBLISHER US
## Page.Description
## 16792 WKNO-FM is the Mid-South's source of NPR News, local information and classical music.\n\nWKNO-FM | WKNO
## 16793 KUNC brings news and storytelling to Coloradoâ\200\231s Front Range, mountain and Eastern Plains communities.
## 16794 For everyone lesbian, queer, and under the rainbow. Tagg...you're it!
## Page.Created Likes.at.Posting Followers.at.Posting
## 16792 2008-07-29 18:46:35 5845 6071
## 16793 2008-11-25 16:59:09 9762 10131
## 16794 2012-08-09 01:47:22 9410 9751
## Post.Created Post.Created.Date Post.Created.Time Type
## 16792 2023-07-21 12:45:00 CDT 2023-07-21 12:45:00 Link
## 16793 2023-07-19 19:27:03 CDT 2023-07-19 19:27:03 Link
## 16794 2023-07-18 11:05:04 CDT 2023-07-18 11:05:04 Link
## Total.Interactions Likes Comments Shares Love Wow Haha Sad Angry Care
## 16792 0 0 0 0 0 0 0 0 0 0
## 16793 0 0 0 0 0 0 0 0 0 0
## 16794 0 0 0 0 0 0 0 0 0 0
## Video.Share.Status Is.Video.Owner. Post.Views Total.Views
## 16792 - 0 0
## 16793 - 0 0
## 16794 - 0 0
## Total.Views.For.All.Crossposts Video.Length
## 16792 0 N/A
## 16793 0 N/A
## 16794 0 N/A
## URL
## 16792 https://www.facebook.com/100046315040252/posts/874742670746226
## 16793 https://www.facebook.com/100064101066456/posts/683186593828037
## 16794 https://www.facebook.com/100063465255924/posts/759796136145888
## Message
## 16792 After months of inescapable marketing, viral memes and crossover merch, two of the year's most anticipated movies hit theaters on Friday. Here's why so many people want to see both â\200” and how to prep.
## 16793 It started out as a meme mashing together two highly anticipated films being released on the same day, July 21: Greta Gerwig's "Barbie" and Christopher Nolan's "Oppenheimer". But the "Barbenheimer" is also a double-feature juggernaut that many people plan to see, even if it seems like an odd pairing. That's music to the ears of some theaters, like The Lyric in Fort Collins, which expects presales of "Barbie" alone to set an all-time record for the indie theater. KUNC's Natalie Skowlund and Alex Hager look at the Barbenheimer phenomenon.
## 16794 Barbenheimer: The Great Bisexual Reckoning A good problem to have and why you should be excited about it! Read more.
## Link
## 16792 https://www.wknofm.org/2023-07-21/what-to-know-about-the-barbenheimer-double-feature-frenzy
## 16793 https://buff.ly/3Q1BQG0
## 16794 https://shar.es/afVNm1
## Final.Link
## 16792
## 16793 https://www.kunc.org/arts-life/2023-07-19/local-theaters-anticipate-post-covid-salvation-with-barbenheimer-dual-release
## 16794 https://taggmagazine.com/barbenheimer/
## Image.Text
## 16792
## 16793
## 16794
## Link.Text
## 16792 What to know about the 'Barbenheimer' double feature frenzy
## 16793 Local theaters anticipate post-COVID salvation with 'Barbenheimer' dual release
## 16794 Barbenheimer: The Great Bisexual Reckoning
## Description
## 16792 After months of inescapable marketing, viral memes and crossover merch, two of the year's most anticipated movies hit theaters on Friday. Here's why so many people want to see both â\200” and how to prep.
## 16793 A dual release of both Greta Gerwigâ\200\231s Barbie and Christopher Nolanâ\200\231s Oppenheimer on Friday have local theaters in Fort Collins and beyond hoping to bring moviegoers back into their seats. Will â\200\230Barbenheimer,â\200\231 as the double feature has been termed, be the cure?
## 16794 Barbie/Oppenheimer is giving us bi panic.
## Sponsor.Id Sponsor.Name Sponsor.Category
## 16792 NA
## 16793 NA
## 16794 NA
## Total.Interactions..weighted..â....Likes.1x.Shares.1x.Comments.1x.Love.1x.Wow.1x.Haha.1x.Sad.1x.Angry.1x.Care.1x..
## 16792 0
## 16793 0
## 16794 0
## Overperforming.Score
## 16792 -28
## 16793 -36
## 16794 -38
With so many variables, this can be a little hard to read! But, you can also use View()
to look at the dataset. Either type View(social_media_file)
into the console or click the filename in the Global Enviornment.
4.1 Variables
If you want to look at a variable, you can use a $
operator. For example social_media_file$Message
refers to the text variable (also known as the data “vector” or “field”). Let’s look at the first 6 rows of this column.
## [1] "Check out our Emo spotify playlist! http://bit.ly/EmoNeverDies:=:https://open.spotify.com/playlist/3czha2tGePoOcVI4xwa3j3"
## [2] "¡QUIERE SER KEN! 🩷 Cillian Murphy (Oppenheimer) comentó que esta abierto a interpretar a Ken en una futura secuela de 'Barbie' 🩷"
## [3] "The true Barbenheimer experience"
## [4] "¡BARBENHEIMER ES REAL! En un cine de Estados Unidos durante una función de 'Oppenheimer' hubo un fallo en el proyector y en los últimos 20 minutos la mitad de la pantalla se volvió rosa ðŸ\230† El Barbie x Oppenheimer se volvió canon."
## [5] "Barbenheimer สมัยตà¸à¸\231à¸à¸¢à¸¹à¹\210 Gotham ðŸ\230…"
## [6] "Barbenheimer 2023. ðŸ¤\235ðŸ\217¼ 🎨: @justralphy"
How would you look at the first 10 rows in the text
variable?