15 Appendix 2: Reading data into R and saving output

Data Wrangling Recipes in R: Hilary Watt

15.1 Avoid the need to write your directory

See next three sections show you how to avoid the need to type in your directory name. The process is to read the data into R using menus, after you have firstly made sure the file is in the desired directory for this project. Then you copy the code from the “history” section (code is written there, even though you used menus). Then you can copy the directory element of that code and point R towards this specific directory (using setwd() command). R then looks in this specified directory when reading any files into R, and save R files to this directory, without you need to write down the directory name again.

15.2 Read in & view R data using RStudio menus

Firstly, choose/ create a directory for your R files related to this handbook. Save relevant files, included anaemia.csv dataset into this directory (file provided with this handbook).

View the file within excel. The first line of anaemia.csv is sensible variable names, followed by several lines of data. This enables the data to be read into R. With other datasets, you may need to tidy the file within excel to achieve this structure (variable names then data), before reading the data into R.

Open the file using menus within RStudiothis avoids the need to write the directory name.

In Environment tab (top right of R layout), click “Import Dataset” – select (From Text (base)…) for .csv file:

(find the relevant directory and double click on the filename)

Set Heading to “Yes” => first line of data in .csv file becomes variables names. Click on Import.

When you’ve successfully imported it using menus, click on “History” tab, (RStudio, top right). Find the “open dataset” command (read.csv) and the View command (last 2 lines in history, assuming data-frame was just opened).

Select the 2 code lines with mouse and click “to source” to move into open R script file. This records which dataset is used for your R script code and enables quick reopening of the relevant dataset.

Note: there are 2 options to read in .CSV files. The readr() option may be better for more complex datasets (for dates and you can specify some data-formats as you read data in).

15.3 Code to read in dataset

Using read.csv() for loading a CSV file

anaemia  <-  read.csv("<file path>/anaemiaB.csv",  header  =  TRUE) 

In practice, replacing with the name of your own file path:

anaemia  <-  read.csv("C:/Users/hwatt/OneDrive – Imperial College London/GMPH/R handbook/anaemiaB.csv",  header  =  TRUE)

(this is the command that appeared when using menus above).

Windows computers: to copy file name with file path, hold down shift key and right click on the file name. Select “copy as path” from the list that appears.

MORE ADVANCED options for reading in data:

Written command version of “from text (readr)”, requires us to specify: library(readr) BEFORE using relevant readr() code.

Importing data into R (including from excel):Introduction to Importing Data in R

Intermediate importing data (from databases, from web, from stats packages): Intermediate Importing Data in R

Chapter 11 of R for Data Science, Wickham & Grolmund

15.4 Point R to chosen working directory

getwd() shows current working directory. If you save or open files without specifying a directory, R looks here.

setwd(dir) can set the working directory (replace “dir” with your chosen working directory – my strategy is to open files using menus, then copy and paste directory element from open file command).

setwd(“C:/Users/hwatt/OneDrive – Imperial College London/GMPH/R handbook”)

This enables use of shorter read and write commands, since they no-longer need to point to the relevant directory:

anaemia  <-  read.csv("anaemiaB.csv",  header  =  TRUE) 

15.5 Saving R code into R script files

Open an R script file to save your R code, to avoid losing all your work. Be certain to add in many comments (starting #, so that R does not attempt to interpret them as code). Otherwise, your own code may soon become impenetrable to you.

Select file (top left of RStudio), select “new file”, then “R script”, to open a new R script file, as shown:

Immediately name and save the R script file: by clicking on the save icon (at top of R script file; R script file is top left of RStudio). Save regularly as you code.

15.6 Write comments for your (programmer) benefit, within R code

R script files are collections of R code, interpreted by R, for action. R code can include code that amends datasets, produces tables and graphs and analyses data. Even very experienced R users are often clueless as to what bare code is doing – including code that they wrote themselves. Hence, any sensible coder adds comments into R code to help navigate and understand the code. Don’t assume you have the memory the size of an elephant. Instead, add in comments such as “Checking for errors”, “looking at shapes of distribution”, “this appears to be a Normal distribution”. You might want to document reasons for any choices made (in data management and methods of analysis) in the R script.

# Comments follow-on from the “#” symbol: which might be at the beginning of a line or after a command.

# Checking for outliers

hist(anaemia$weight) #plots histogram

# table(anaemia$weight) code line staarting with # is commented out so is not run by R

RStudio helps you by displaying comments in a different colour.

15.7 Saving R datasets

It is essential to keep a tidy file of your data cleaning and modification code. Then rerun with any required amendments. If you save an R dataset, then with a tidy R script file that creates it, you can feel confident to overwrite these. This avoids potential confusion of having many different versions of your dataset.

Always keep a copy of your datasets as provided to you. Be certain to always write data with a different name to this original file.

Saving as R format means that the format of all variables will be retained. You can potentially keep only cleaned versions of variables that you will need for your analysis. With a tidy R script that creates this, it is easy to rerun keeping more variables (after cleaning them) if you later find you need more.

# save dataframe or similar object to R file

saveRDS(object, file = "my_data.rds")

Save a data-frame (or similar) as an RDS file in R: file = specifies the name of the file where the R object is saved or read from. Example: save edited and cleaned version of the anaemia dataset.

# save anaemia dataset

saveRDS(anaemia, file = "data/anaemia_cleaned.rds")

If a file called anaemia_cleaned.rds already exists, this command will replace the old dataset with the new dataset, without warning.

Datasets can also be exported from R into other forms such as comma separated value (CSV), tab-delimited, SAS, or STATA.

?write.csv provides help on exporting a data frame to a CSV file.

15.8 Best not to save your workspace

Whilst R offers to save your workspace by default, this is best avoided. The workspace is your current R working environment and includes any user-defined objects (vectors, ma trices, dataframes, lists, etc.). It is better to re-run your R scripts and generate the necessary output, or only save relevant plots, than to save everything each time.

15.9 Saving tables and images from your output

To export tables to Word, follow these steps:

  • Create a table in R.
  • Write this table to a comma-separated .csv file using write.csv()
  • Open in excel. You might want to tidy up the format in excel.
  • Then copy and paste the table from excel into Word

Within excel, you can explore the possibility of stringing numbers and text together using the symbol &. When you do this, you generally need to round numbers off, to avoid excessive decimal places. The following shows an example with 2 such codes. Text elements in the first are merely spaces and opening and closing brackets. Such code can potentially be copied down and copied right within excel.

Alternatively, similar things can be achieved within R using paste() function, to add elements together, along with round() function as required.

15.10 Saving a plot

Within the plots window, use Export to save your plot/graph as an image or as a pdf (see below). Or else select “copy to clipboard”, then when the next window appears, click on “copy plot” and paste into word or similar.

Alternatively, save figures using commands in your R script:

  • You might want to use setwd() to point R towards a chosen directory, then the directory name does not need to be specified within save commands.
  • Specify file name and location to save your image using a function such as pdf(), png(), or jpeg(). Additional arguments can optionally be used to specify the height, width, and resolution of the image.
  • Create the plot using R commands.
  • Close the graphics device with dev.off().

Example:

# Open a pdf device saving to the file 'rplot.pdf'
pdf("rplot.pdf")

# Create a plot
plot(x = my_data$wt, y = my_data$height)

# Close the pdf file
dev.off()

15.11 R file types and file extensions

  • R scripts for saving R code: .R
  • R projects: not covered by this manual but worth exploring to keep projects together: .Rproj
  • R markdown files: not covered by the manual, but useful for writing documents (with titles, numbered listed and tables), R code and R output. For instance, this can be used to produce written reports that analyse data and report the results: .Rmd
  • Single R object / dataset, saved in R format the retains data-type: .rds
  • Many R objects / datasets, saved in R format the retains data-type: .RData or .rda

Data Wrangling Recipes in R: Hilary Watt. PCPH, Imperial College London.