1.3 Packages and Libraries

R, like Stata, has a number of in-built functions, referred to as ‘Base R’. Given R’s open source nature, a number of individuals have written useful packages, which extend the functionality of R by providing a number of functions to do common tasks. There is some ongoing debate in the R world surrounding overreliance on user-written packages versus using exclusively Base R. One of the drawbacks of relying on packages is that packages may change from version to version, necessitating occasional updates to your old code. Some statisticians prefer to use Base R as coding each function that may be needed explicitly is useful for comprehension of the underlying workings of a particular statistical (or data management) process. Other statisticians will use packages to make their code more streamlined, easier to read, and or more efficient. Using packages also prevents the R user from “reinventing the wheel” - if someone else has already done the legwork to provide a custom function for you, it saves time to use it!

This guide to R is written using a number of packages - without them, our task would be made substantially more difficult, given that many of the commands built into Stata are not built in to Base R. We will have to rely on packages for some of the analyses we encounter. We install packages using the install.packages() command. You should only have to install packages once - if they have been downloaded to your LSHTM account or to your laptop, you should not have to install them again. Because of the way the computers are set up at the school, you may receive a prompt when installing packages to install them in a different directory - click yes if prompted. You will know a package has successfully been installed if it appears in the ‘Packages’ tab in the bottom right pane of RStudio. It is possible you may have to run the install.packages() command more than once to successfully install a package.

#--- Install packages
#install.packages(c("foreign", "magrittr", "psych", "epiDisplay", "tidyverse"))

After installing a package, before you can use it you must load its functions from its library. This is done with the library() command.

So far, in the pen and paper practicals, you have worked with summary level data in the form of tables of percentages, frequencies, means, standard deviations, etc. When moving to the computer practicals, we will work from the original (raw) data. So a statistical analysis will allow us to reproduce these tables before we analyse. These data are held in a file (normally a .csv file, or a .dta file). For R to work, the data must first be read in. After reading in the data, we can apply a series of commands to either manipulate or analyse the data. To see this in action, we will use R to carry out some simple analyses on the BAB.dta dataset. As with Stata, you will need to copy the datafiles for the practicals to your home drive. In order to do this, you will need to copy the files located in “U:\download\teach\steph” to a folder on your H:\ drive.