Chapter 8 Creating your first script
A script can be broken down into a few layers which we are going to go through in detail, but here’s a general overview.
- Load/install required packages
- Load your data
- Perform data wrangling tasks
- Then what?
- create figures
- run statistics
- create tables
- Save outputs
Figures are saved in your /images
directory, whereas the statistics and tables are saved in /data
for future use. Each section is coded so that I can fold/unfold a given section. This allows me to only focus on the section of code that is important at that given point and time.
8.1 Headers
Before we get started you’ll see “Headers” throughout the script which are used to keep things organized. I use the following code within an .R file (not an .Rmd)
# Header 1 ----------
# ▐ Header 2 --------
# ▐ ▬ Header 3 --------
8.2 Part 1: Loading/installing packages needed
At the very top of my script I will load the packages I need to get things done. This will vary slightly depending on the script. For example some of my packages (e.g., lme4
or easystats
) are only loaded when I run statistics. There’s a few example below, I suggest skipping to the section that applies to your situation.
# Load/Install required packages ---------------------
if (!require("pacman")) install.packages("pacman")
::p_load(conflicted,readxl, ggplot2, esquisse, Rmisc, tidyverse, car, easystats, apastats, sjlabelled, rio) #p_load This function is a wrapper for library and require. It checks to see if a package is installed, if not it will install it.
pacman
#install_formats #Run once to install rio wrappers
8.3 Part 2: Loading your data
You should have data that you are trying to manipulate. Below I show the most common examples. I generally always call my data df
. This allows me to easily copy/paste code between projects. Its good practice to do this if you can. Therefore, as you advance with your R scripts you won’t need to spend precious time using Find/Replace
.
I have recently switched my philosophy to use rio which greatly reduces the students requirements.
8.3.1 xlsx
# Load your dataset ---------------------
<- read_excel("raw/CC_Body_FA.xlsx", sheet = "Sheet1" ) # import your dataset - uses 'readxl'
df <- import("raw/data.xlsx", which = "Sheet1") # Uses the `rio` package df
8.3.2 csv
When I have particularly large files to write from MATLAB, I prefer to use *.csv
files over *.xlsx
because they write faster. If you are dealing with datasets that are larger than 1GB in size you should consider using data.table
instead of data.frame
.
# Load your dataset ---------------------
<- import("raw/data.csv") # Import using `rio` df
8.3.3 Google Sheets
It is also possible to read from a Google Sheet using the googlesheets4
package.
<- read_sheet("https://docs.google.com/spreadsheets/d/1V99DMca-Qdy3G7kyg9zTONvBVagtnBrj4nm78Fj1vU8", sheet = "Head Measures & Information") # requires 'googlesheets4' library df
8.3.4 sav (SPSS)
This is the general data format for SPSS. With this filetype “attributes” are also imported, which normally I like to remove. In my own experience, some functions don’t play nicely with dataframes that have labels. You can read more on this here.
<- rio::import("data/data.sav") df
A full tutorial on importing other data types can be seen here on DataCamp. In general, try and stick to the formats shown above. If you are importing data from another statistical program (e.g., SPSS, STATA or SAS) you will often get a ton of attributes that are imported in the data.frame. This can be a good thing at times because it may give you additional information on the column variable. However, some statistical functions tend to get fussy when your data.frame contains these attributes. These are shown below.
<- import("http://www.stata-press.com/data/r13/auto.dta") %>%
df.auto ::remove_attributes(c("label", "format.stata"))
labelled
<- import("http://www.stata-press.com/data/r13/auto.dta") %>%
df.auto ::zap_formats() haven
Finally, its possible you want to open data from other forms including 1. .txt 2. . 3. SPSS 4. Mini-table
8.4 Cleaning your imported data.
It’s possible to clean up your dataset as it comes in by using the janitor package. Click the link for a couple examples. In essence it will scan through the column names and fix them according to a notation you specify.
Now that you have your df
loaded, lets take a look and see what we have. There are 4 types of data that can be held in a data.frame, in R these are referred to as class
.
- Numeric
- Characters
- Factors
- Dates
You can view the type within a particular column by running the following code
sapply(df, class)
The class
of your columns may not seem important right now, but later on when we manipulate the data, it will be crucial to make sure these are accurate. Below is an example of an xlsx
file which is imported. We expected dti_value
to be numeric, but due to a dash in one of the cells, it was imported as character.
8.5 Part 3: Saving Outputs
Once you are done running your R Scripts you will want to save some outputs (notably statistical models and dataframes) so they can become part of your RMarkdown document (manuscript.Rmd
).
We will want to save our results as an *.RData
file. You can save outputs a few different ways. The first uses the default save
function
# Save your environment ------------
# Save it to .RData -----------
save(journey_time,modsum, model, file = "data/analyzedData.RData") #Save a list of tables that I'll use in the .Rmd file.
# Save the tables into data/tables.RData using "patterns" ==================
save(list=ls(pattern="table"), file = "data/tables.RData") #Save a list of tables that I'll use in the .Rmd file.
save(list=ls(pattern="mod"), file = "data/stats.RData")
However, this function will overwrite every time you run it. What if you want to add environment variables to an existing RData file? We can use the resave
function from cgwtools
.
# Save your environment ------------
# Save the tables into data/tables.RData by listing them individually
::resave(tbl.demo.mios, tbl.demo.acap, file = "data/tables.RData") #resave a list of tables that I'll use in the .Rmd file.
cgwtools
# Save the tables into data/tables.RData using "patterns" ==================
::resave(list=ls(pattern="tbl"), file = "data/tables.RData") cgwtools
Finally, its possible that you need to export the data for a colleague into a more useable file extension (because they aren’t cool enough to run their analyses in R yet…). We once again use the rio
package to accomplish this.
# Optional - Save df as xlsx --------
export(list(mtcars = mtcars, iris = iris), "multi.xlsx")
export(processed_data, "processed_data.xlsx") # uses the rio package
# Other Options (not recommended) --------
::write.xlsx(tmp2, "data/interactions.xlsx", sheetName = "Interaction2", append = TRUE) # uses the xlsx package
xlsx
::write.xlsx(daily, "data/daily.xlsx") # uses the openxlsx package but you can't append sheets with this package as far as I know. openxlsx