Chapter 5 Tasks

During the workshop, we’ll be carrying out a series of tasks, including installing software and packages as well as tasks of the epicycle of data science.

5.1 Install software

5.1.1 Install R

Go to https://cloud.r-project.org, and download and install R based on your computer system.

5.1.2 Install R-studio

Go to https://www.rstudio.com, and download and install R-Studio based on your computer system.

5.1.3 Install R packages

  1. Click on Packages Illustration on the first step to begin installing an package in R-studio. Important: we will only be installing our R packages through R-studio. The packages that we will be installing are those listed in section 4.3 of this document.

  2. Click on install Illustration on the first step to begin installing an package in R-studio

  3. Type in the space, in lower case and separated by commas, the following packages: tidyverse, readxl, foreign Type packages’ names

Important: make sure to leave checked the option of “Install dependencies”. What this means is that some packages sometimes depend on other packages to be installed: if they are not installed already in R, the installation will proceed to installed the dependencies.

5.2 Create a Github account:

  1. Go to https://www.gitbub.com and create an account.
  2. Once you created your Github account, search within github the repository titled “uprprise_ds_wrkshp” (without the quotation marks). Click on the icon “Fork”: the repository will now be shared with you (it will appear among the repositories in your account).

From this point on, we’ll be implementing the epicyle of data science. For this reason, there will be many decision-making between each group. It is important that we communicate between the each other in the group.

5.3 Identifying a dataset of interest

Because we may have different scientific interest, rather than me providing you a dataset to work with, I will instead provide you with some online databases (i.e. web-links) and you and your group decide the database to download. The possible webpages from where to download data are found in the web-link listed below:

https://github.com/friveramariani/uprprise_ds_wrkshp/blob/master/data/healthcare-data.md

Important: You and your group will be sharing with the workshop participants why you and your group selected that data for analysis.

5.4 Download the dataset into your computer

  1. Each member of the group needs to download the dataset into an folder in their computers. It is very important that we save the dataset to a place easily accessible in our computers.

  2. As per the epicycle of data science, What should be the expectations that we must set at this stage?

5.5 Clean (structure) data

What could be the expectation(s) to have at this stage?

What function(s) and/or line(s) of code could you implement to complete this task?

5.6 Tidy the data

Note this task will only be needed if the dataset does not meet the following criteria: each column being a variable.

What could be the expectation(s) to have at this stage?

What function and/or line(s) of code could you implement to complete this task?

5.7 Exploratory analysis

What could be the expectation(s) to have at this stage?

What function and/or line(s) of code could you implement to complete this task?

After exploring the data, are there any corrections needed in the data?

After exploring the data, are there any patterns and possible relationships to follow?

5.8 Statistical inference

Note: we may or may not reach this stage. In case we do, I will be helping the groups make the decision of the relevant statistical test to perform.

What could be expectations to have at this stage?

What function and/or line(s) of code could you implement to complete this task?