Importing Data File

To read in tabular data (e.g., .txt file), use read.table().

# read.table(file, header=F, sep="", ...)
data <- read.table("path/datafile.txt", header = T, sep = "\t")  # set working directory

Some important arguments are:

  • header : logical. Does the first row contain column labels?
  • sep : the field separator character which can be
    • “” : spaces(default)
    • “\t” : tab-delimited
    • “,” : comma-separated

* Run ?read.table to see specific arguments in the function

If data is in .csv format, use read.csv().

# read.csv(file, header=T, sep=",")
data <- read.csv("exampledata.csv", header = T)  

To import data in other formats (such as SPSS data files), use functions in foreign package.

library(foreign)
data <- read.spss("path/datafile.sav", to.data.frame = T)   

After importing the data file, you can check whether you have correctly read in the data. Below are some example functions to use.

head(data)      # first 6 rows of the data
##   ID group score1 score2
## 1  1     1     35     45
## 2  2     1     23     14
## 3  3     1     14     26
## 4  4     1     17     25
## 5  5     1     23     27
## 6  6     1     35     47
data[1:10, ]    # extract the first 10 rows 
##    ID group score1 score2
## 1   1     1     35     45
## 2   2     1     23     14
## 3   3     1     14     26
## 4   4     1     17     25
## 5   5     1     23     27
## 6   6     1     35     47
## 7   7     1     27     37
## 8   8     1     33     50
## 9   9     1     32     15
## 10 10     1     31     37
tail(data)      # last 6 rows of the data
##    ID group score1 score2
## 15 15     2     39     37
## 16 16     2     45     41
## 17 17     2     31     25
## 18 18     2     40     17
## 19 19     2     25     15
## 20 20     2     32     27
str(data)       # structure of thedata frame
## 'data.frame':    20 obs. of  4 variables:
##  $ ID    : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ group : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ score1: int  35 23 14 17 23 35 27 33 32 31 ...
##  $ score2: int  45 14 26 25 27 47 37 50 15 37 ...
summary(data)   # summary of data
##        ID            group         score1          score2  
##  Min.   : 1.00   Min.   :1.0   Min.   :14.00   Min.   :14  
##  1st Qu.: 5.75   1st Qu.:1.0   1st Qu.:26.50   1st Qu.:23  
##  Median :10.50   Median :1.5   Median :32.00   Median :27  
##  Mean   :10.50   Mean   :1.5   Mean   :31.50   Mean   :31  
##  3rd Qu.:15.25   3rd Qu.:2.0   3rd Qu.:35.25   3rd Qu.:42  
##  Max.   :20.00   Max.   :2.0   Max.   :51.00   Max.   :50
attributes(data)  # attributes of object
## $names
## [1] "ID"     "group"  "score1" "score2"
## 
## $class
## [1] "data.frame"
## 
## $row.names
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20
names(data)  # variable names
## [1] "ID"     "group"  "score1" "score2"
dim(data)    # dimension of data (number of rows and columns)
## [1] 20  4
dimnames(data)   # row and column names
## [[1]]
##  [1] "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10" "11" "12" "13" "14" "15"
## [16] "16" "17" "18" "19" "20"
## 
## [[2]]
## [1] "ID"     "group"  "score1" "score2"
nrow(data)   # number of rows
## [1] 20
ncol(data)   # number of columns
## [1] 4
class(data)  # class of object
## [1] "data.frame"