Chapter 3 Our Data Science Toolbox

Before we engage in making the witness (a.k.a. data) confess, we need to install some programs into our computers, as well as creating some accounts on the web.

3.1 R

The computer language we will be using in this workshop is R. The open-source computer language R was developed mainly to address statistical computing; nevertheless, pretty much anything can be carried out in R, such as creating websites, editing photos, making virtual illustrations, connecting with other software, among others. It is one of the popular computer langauges among data scientists. Learning to perform statistics and data analytics through codes in the R language may sound intimidating (no worries, because I was), but the learning it can be as fun as learning to master our smartphones.

3.2 R-Studio

R Studio is a platform that will facilitate how we use R to analyse our data.

3.3 R packages

Although the base installation of R has many functions and programming approaches to manipulate the data, such as cleaning, structuring, and cleaning the data accordingly, we will be employing packages that will help us accomplish these tasks.

  • tidyverse: a package with a set of packages that facilitate manipulating datasets, including
  • readxl: package to load ExcelTM (e.g. .xls, .xlsx) files into R
  • foreign: package to load datasets in other formats extensions, such as in SPSS (.sav), Matlab (.mat), etc, into R
  • psych: package with functions to explore datasets and make multivariate analysis. Although initially develop to evaluate datasets in mental health studies, its functions apply to any dataset
  • pastec: package with functions to explore and make multivariate analysis
  • car: package with functions to facilitate visual exploratory analysis

Furtherdown during the workshop exercises, we may consider installing other other packages for very specific data analysis tasks. Note, do not go through hyperlinks include with the packagaes above – they were included as a citation.

3.4 R functions

A good source of information to get familiarize with R, including among users new to programming or the R computer language, is the following website:

http://www.cookbook-r.com

During the workshop, I will be listing some functions that will be frequently used when analyzing the data. I won’t mention it here.

3.5 Github

Github is one, if not the most, most common repository for sharing scripts and collaborating among programmers. During the workshop, we’ll be creating an account with Github. Dr. Rivera-Mariani will be providing further instructions with Github later during the workshop.