Chapter 6 Loading Data

For most analyses that you conduct in R, the first step involves importing a data set into R. There are lots of different ways to load data into R, and many different types of data it can read too.

Datasets that we will use for this section can be downloaded here. It is a zip folder. Save it somewhere where you can easily find it.

We will be using the read.csv function to do this, as our data is stored in Excel files. One thing to be mindful of here is the path to the file.

If you have saved the file within your current working directory, you can simply write:

books <- read.csv("books.csv", header = T)

Note that the books dataset has now appeared in the Global Environment.

There are other ways that you can do this too. For example, you can use the read.table() function, and you can also use the read.spss() or read.sav() functions for SPSS files too. Note, to use the SPSS import functions, you will need to load the foreign or haven packages (more on this later).

Now that we have our data read into R, lets have a look at it. We might first want to see a breakdown of the data frame. We can do this by using the str() function.

str(books)
## 'data.frame':    500 obs. of  2 variables:
##  $ comic   : int  -44 20 0 -18 -19 13 16 14 -6 11 ...
##  $ statbook: int  16 -14 6 -13 7 -33 3 -7 -6 -3 ...

We can see that we have 2 variables, one called comic and one called statbook. Both are numeric, and there are 500 observations in each. We could also extract the specific information in single commands:

ncol(books) #number of columns
## [1] 2
nrow(books) #number of rows
## [1] 500
colnames(books) #column names
## [1] "comic"    "statbook"

If we wanted to have a quick glance at the data, you could use the head() or tail()functions. If you really wanted to see all of your data, you can use theprint()` function.

head(books) #first 6 rows
##   comic statbook
## 1   -44       16
## 2    20      -14
## 3     0        6
## 4   -18      -13
## 5   -19        7
## 6    13      -33
tail(books) #last 6 rows
##     comic statbook
## 495    11        3
## 496     9       13
## 497    10      -10
## 498    11      -22
## 499    10      -10
## 500    26       10

6.1 Practical Example

(Example is partially adapted from A. Field, “Discovering statistics using R”, Sage, chapter 10, p. 400)

The example contains data relating to what contributed to pain relief for patients and compares the effects – of administering a sugar pill to a patient (placebo condition, dose code = 1), or a low dose of a drug, for instance ibuprofen (dose code = 2) or a high dose of the same drug (dose code = 3).

We thus have two main variables and we surveyed 15 participants: Condition – 1 (Placebo), 2 (Low dose of ibuprofen), 3 (high dose of ibuprofen); and Pain level (effect) – measured at scale 1-10

Now its over to you - read in the ‘dose.csv’ file, check the type of data you have, level and label where appropriate. You should also explore the data using the commands that we used above too.

#Read in data
exp <- read.csv("dose.csv", header = T)
exp
##    ID dose effect
## 1   1    1      3
## 2   2    1      2
## 3   3    1      1
## 4   4    1      1
## 5   5    1      4
## 6   6    2      5
## 7   7    2      2
## 8   8    2      4
## 9   9    2      2
## 10 10    2      3
## 11 11    3      7
## 12 12    3      4
## 13 13    3      5
## 14 14    3      3
## 15 15    3      6
str(exp)
## 'data.frame':    15 obs. of  3 variables:
##  $ ID    : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ dose  : int  1 1 1 1 1 2 2 2 2 2 ...
##  $ effect: int  3 2 1 1 4 5 2 4 2 3 ...
exp$dose <- factor(exp$dose, levels=c(1,2,3), labels = c("Placebo", "Low_dose", "High_dose"))
is.factor(exp$dose)
## [1] TRUE
head(exp)
##   ID     dose effect
## 1  1  Placebo      3
## 2  2  Placebo      2
## 3  3  Placebo      1
## 4  4  Placebo      1
## 5  5  Placebo      4
## 6  6 Low_dose      5