15 Appendix 2: Reading data into R and saving output
Data Wrangling Recipes in R: Hilary Watt
15.1 Avoid the need to write your directory
See next three sections show you how to avoid the need to type in your directory name. The process is to read the data into R using menus, after you have firstly made sure the file is in the desired directory for this project. Then you copy the code from the “history” section (code is written there, even though you used menus). Then you can copy the directory element of that code and point R towards this specific directory (using setwd() command). R then looks in this specified directory when reading any files into R, and save R files to this directory, without you need to write down the directory name again.
15.3 Code to read in dataset
Using read.csv()
for loading a CSV file
In practice, replacing
anaemia <- read.csv("C:/Users/hwatt/OneDrive – Imperial College London/GMPH/R handbook/anaemiaB.csv", header = TRUE)
(this is the command that appeared when using menus above).
Windows computers: to copy file name with file path, hold down shift key and right click on the file name. Select “copy as path” from the list that appears.
MORE ADVANCED options for reading in data:
Written command version of “from text (readr)”, requires us to specify: library(readr)
BEFORE using relevant readr()
code.
Importing data into R (including from excel):Introduction to Importing Data in R
Intermediate importing data (from databases, from web, from stats packages): Intermediate Importing Data in R
15.4 Point R to chosen working directory
getwd()
shows current working directory. If you save or open files without specifying a directory, R looks here.
setwd(dir)
can set the working directory (replace “dir” with your chosen working directory – my strategy is to open files using menus, then copy and paste directory element from open file command).
This enables use of shorter read and write commands, since they no-longer need to point to the relevant directory:
15.5 Saving R code into R script files
Open an R script file to save your R code, to avoid losing all your work. Be certain to add in many comments (starting #, so that R does not attempt to interpret them as code). Otherwise, your own code may soon become impenetrable to you.
Select file (top left of RStudio), select “new file”, then “R script”, to open a new R script file, as shown:
Immediately name and save the R script file: by clicking on the save icon (at top of R script file; R script file is top left of RStudio). Save regularly as you code.
15.6 Write comments for your (programmer) benefit, within R code
R script files are collections of R code, interpreted by R, for action. R code can include code that amends datasets, produces tables and graphs and analyses data. Even very experienced R users are often clueless as to what bare code is doing – including code that they wrote themselves. Hence, any sensible coder adds comments into R code to help navigate and understand the code. Don’t assume you have the memory the size of an elephant. Instead, add in comments such as “Checking for errors”, “looking at shapes of distribution”, “this appears to be a Normal distribution”. You might want to document reasons for any choices made (in data management and methods of analysis) in the R script.
# Comments follow-on from the “#” symbol: which might be at the beginning of a line or after a command.
# Checking for outliers
hist(anaemia$weight) #plots histogram
# table(anaemia$weight) code line staarting with # is commented out so is not run by R
RStudio helps you by displaying comments in a different colour.
15.7 Saving R datasets
It is essential to keep a tidy file of your data cleaning and modification code. Then rerun with any required amendments. If you save an R dataset, then with a tidy R script file that creates it, you can feel confident to overwrite these. This avoids potential confusion of having many different versions of your dataset.
Always keep a copy of your datasets as provided to you. Be certain to always write data with a different name to this original file.
Saving as R format means that the format of all variables will be retained. You can potentially keep only cleaned versions of variables that you will need for your analysis. With a tidy R script that creates this, it is easy to rerun keeping more variables (after cleaning them) if you later find you need more.
Save a data-frame (or similar) as an RDS file in R: file = specifies the name of the file where the R object is saved or read from. Example: save edited and cleaned version of the anaemia dataset.
If a file called anaemia_cleaned.rds
already exists, this command will replace the old dataset with the new dataset, without warning.
Datasets can also be exported from R into other forms such as comma separated value (CSV), tab-delimited, SAS, or STATA.
?write.csv
provides help on exporting a data frame to a CSV file.
15.8 Best not to save your workspace
Whilst R offers to save your workspace by default, this is best avoided. The workspace is your current R working environment and includes any user-defined objects (vectors, ma trices, dataframes, lists, etc.). It is better to re-run your R scripts and generate the necessary output, or only save relevant plots, than to save everything each time.
15.9 Saving tables and images from your output
To export tables to Word, follow these steps:
- Create a table in R.
- Write this table to a comma-separated .csv file using write.csv()
- Open in excel. You might want to tidy up the format in excel.
- Then copy and paste the table from excel into Word
Within excel, you can explore the possibility of stringing numbers and text together using the symbol &
. When you do this, you generally need to round numbers off, to avoid excessive decimal places. The following shows an example with 2 such codes. Text elements in the first are merely spaces and opening and closing brackets. Such code can potentially be copied down and copied right within excel.
Alternatively, similar things can be achieved within R using paste()
function, to add elements together, along with round()
function as required.
15.10 Saving a plot
Within the plots window, use Export to save your plot/graph as an image or as a pdf (see below). Or else select “copy to clipboard”, then when the next window appears, click on “copy plot” and paste into word or similar.
Alternatively, save figures using commands in your R script:
- You might want to use setwd() to point R towards a chosen directory, then the directory name does not need to be specified within save commands.
- Specify file name and location to save your image using a function such as pdf(), png(), or jpeg(). Additional arguments can optionally be used to specify the height, width, and resolution of the image.
- Create the plot using R commands.
- Close the graphics device with dev.off().
Example:
15.11 R file types and file extensions
- R scripts for saving R code:
.R
- R projects: not covered by this manual but worth exploring to keep projects together:
.Rproj
- R markdown files: not covered by the manual, but useful for writing documents (with titles, numbered listed and tables), R code and R output. For instance, this can be used to produce written reports that analyse data and report the results:
.Rmd
- Single R object / dataset, saved in R format the retains data-type:
.rds
- Many R objects / datasets, saved in R format the retains data-type:
.RData
or.rda
Data Wrangling Recipes in R: Hilary Watt. PCPH, Imperial College London.