1 First Steps
Before we can get started, you’ll need the MS data as a reference point. I’ve provided that data for inspection and download at the following public Google Drive link. Once downloaded on your machine, I recommend you move the file to your R project’s directory. Doing so will make importing the file much easier.
In the code below, first, I load the tidyverse
, which is an exceptionally useful bundle of packages that makes working with data in R much easier and produces more easily comprehensible code (Wickham 2017). I will offer descriptions of the different functions that the tidyverse
provides us as we utilize them throughout this exercise. For more detailed information on what packages are bundled into the tidyverse
and what each does, visit https://www.tidyverse.org/packages/. Second, I import the reduced form MS dataset using the read_excel()
function from the readxl
package (Wickham and Bryan 2019). Though readxl
is technically a tidyverse
package, it is not one of the core packages loaded via library(tidyverse)
, meaning we need to explicitly load readxl
separately in order to access its functions. The read_excel()
function is rather intuitively named; it reads Excel files, and all it needs to be provided is the filename of the .xls or .xlsx file that you want to import into R. Ensure that the filename provided to read_excel is in the active working directory, otherwise you’ll need to provide the full file path to the read_excel()
function. You can check your working directory by entering getwd()
into the console. You can set your working directory by providing the desired path to the setwd()
function. When importing data into R, it is helpful to give the imported file an informative, memorable object name so that you can reference it more easily in your code. Doing so will prevent you from having to type out the entire file name each time you want to reference it. Instead, you can type your assigned object name. It will be helpful to think of this name as your own personal shorthand for a dateset. Be sure that when you are creating these names that it is both easily understood by you and others who might potentially read your code in the future. To do this, we perform what is called assignment. Conventionally, assignment in R uses the assignment operator, <-
. Though there are other assignment operators and other ways to perform assignment, the most frequently used, least likely to produce errors, and likely easiest to interpret, is to write the name you want your object to have, the assignment operator <-
, and then the data you want that object name to contain.
The reduced form MS dataset includes only the variables that were used in the previous model and excludes spurious data that won’t be used going forward and would complicate merging in future iterations. You can use the head()
function to print a preview of your dataset to the console. The preview will contain a handful of rows for all of the column names in your dataset. In our case, the preview of the dataset should be identical to what is presented in Table 1.1.
library(tidyverse)
library(readxl)
MS_Model_Dataset_Reduced_Form <- read_excel("MS Model Dataset - Reduced Form.xlsx")
head(MS_Model_Dataset_Reduced_Form)
County | WalkerMean.MuniProportion.Multi | WalkerMean.PopProportion.Multi | PA.Ratio | PercentBachelorOrHigher | PercentUnder18 | AvgAQPM01_14 | PrimaryCarePhysRatio | PercentSmokers |
---|---|---|---|---|---|---|---|---|
Adams, Mississippi | 0.0000000 | 0.0000000 | 0.0553314 | 0.193 | 0.231 | 10.000000 | 0.115800 | 0.2286403 |
Alcorn, Mississippi | 0.0800000 | 0.0913098 | 0.0486068 | 0.160 | 0.244 | 11.014286 | 0.267100 | 0.1915657 |
Amite, Mississippi | 0.0294118 | 0.0148759 | 0.1129071 | 0.092 | 0.235 | 9.957143 | 0.628700 | 0.2068875 |
Attala, Mississippi | 0.1397059 | 0.1117933 | 0.0499212 | 0.171 | 0.263 | 10.664286 | 0.211600 | 0.1873878 |
Benton, Mississippi | 0.0000000 | 0.0000000 | 0.1216782 | 0.096 | 0.256 | 10.600000 | 0.423475 | 0.2030095 |
Bolivar, Mississippi | 0.1850980 | 0.0534632 | 0.0535719 | 0.204 | 0.259 | 11.678571 | 0.185100 | 0.2300737 |
One of the most powerful functionalities of R is its ability to read many file types. This means we can import data in varied formats into R and work with all of the files in a single location seemlessly. Using the tidyverse
collection of packages, we can merge data from different sources together easily, arrange data in a more easily interpreted way, and remove excess data that we’re not interested in. Looking at the column names of the MS data, we know we’ll need all of these variables for the other states. In the following sections, I walk you through how to gather, clean, and merge the data for each variable for all of the other states with the MS data.
References
Wickham, Hadley. 2017. Tidyverse: Easily Install and Load the ’Tidyverse’. https://CRAN.R-project.org/package=tidyverse.
Wickham, Hadley, and Jennifer Bryan. 2019. Readxl: Read Excel Files. https://CRAN.R-project.org/package=readxl.