2 How to read data into R

Reading data into R seems to be straightforward, but “the devil is in the detail.” Indeed, we must pay attention to some fine points when use R functions to read data.

Firstly, I make a summary table, which shows the most useful R functions for data import.

import file type, or from R function
.txt read.table()
.csv read.csv() and readr::read_csv()
.xls and .xlsx readxl::read_excel() and openxlsx::read.xlsx
.sav foreign::read.spss()
.Rdata or rda load()
.rds readRDS() and read_rds
Internet download.file()

Secondly, the fine points.

  • Before reading a .txt file, it is a good idea that we open the file with Notepad++ and look at the encoding by clicking on the Encoding tab. If necessary, we can convert the encoding to “UTF-8”, and use read.table(…, fileEncoding = “UTF-8”).

  • When we are dealing with large data, we’d better use readr::read_csv() rather than read.csv() because the former is much faster. It is good practice that we firstly open the file with Excel and spend some time understanding the types of variables. If it is not too troublesome, we should specify the types of variables, e.g. readr::read_csv(…, col_types = “iDdccciccllc”).

  • “load() replaces all existing objects with the same names in the current environment (typically your workspace, .GlobalEnv) and hence potentially overwrites important data.” (R Help) So, if we use load() we should put it at the beginning.

  • A really useful small tip: type file.choose() into R console and then navigate until you find the file and click on Open; in such a way you can quickly know the absolute path of the file that you want to read into R.

Thirdly, examples:

my_data <- read.table(file= "./data/SevenSurgeons.txt", header = TRUE)
load(file = "./data/CDS_all.rda")
file_infor <- readRDS(file = "./data/f_infor.rds")
library(readr)
file_name <- "./data/export.csv"
my_data <- read_csv(file = file_name, locale =  locale(), skip = 1) 
the_url <- "https://raw.githubusercontent.com/LarryZhang2016/Data/master/NZ_cities.csv"
NZ_cities <- read_csv(the_url, skip =1)
library(readxl)
month_and_year <- "July2016"
raw_data_file_name <- paste0("./RawData/", "raw_data_", month_and_year, ".xlsx")
raw_data <- 
  read_excel(path = raw_data_file_name, sheet = "Very_raw_data", skip = 0, col_names = TRUE)