Chapter 1 Background

This book brings together a number of courses created by researchers at the School of Health and Related Research at the University of Sheffield who work in the intersection of health, data science, decision science and economics. It was created with the aim of aiding collaboration, reproducability of research and promoting open science. The hope is that by working through this book and applying the methods in collaborative research endevours projects will be more efficient, research outputs will be more useful, and future collaboration easier.

The views expressed in the book at that of the author, not of the School of Health and Related Research or the University of Sheffield.

1.1 Who are we:

All of the tutors on the course are PhD candidates in the Wellcome Trust Doctoral Training Centre for Public Health Economics and Decision Science at the School of Health and Related Research at the University of Sheffield.

Robert Smith joined ScHARR in 2016. His research focuses on the methods used to estimate the costs and benefits of public health interventions, with a specific interest in microsimulation modelling (done in R). He has become increasingly intersted in the use of R-Markdown and R-Shiny to make research more transparent and to aid decision makers. While doing his PhD, Robert has been involved in projects with the WHO and Parkrun.

Paul Schneider joined ScHARR in 2018. He is working on conceptual and methodological problems in valuing health outcomes in economic evaluations. A medical doctor and epidemiologist by training, he has used R in various research projects, ranging from the modeling costs of breast cancer, and value of information analyses, to the monitoring of influenza in real-time using online data. He is a keen advocate of open science practices.

1.2 Our series of Short Courses in R.

Below is a list of our planned short courses in R.

1.2.1 Course 1 - Intro to R

By the end of the one day short course, the attendee should be able to:

  • Install and navigate R Studio.
  • Set the working directory.
  • Understand the types of objects and basic operations in R.
  • Read in data from csv and excel files.
  • Summarise data.
  • Know where to find further information.

1.2.2 Course 2 - Intermediate R

By the end of the half day short course, the attendee should be able to:

  • Find & download appropriate packages for different tasks.
  • Use tidyverse functions to manipulate data.
  • Use the dplyr package to mutate, select, filter, summarise and arrange data.
  • Analyse datasets by groups.
  • Use tidyr to restructure data.
  • Know where to find further information.

1.2.3 Course 3 - Beautiful Visualisations

By the end of the half day short course, the attendee should be able to:

  • Know the benefits of ggplot over base R.
  • Structure data efficiently to enable the use of ggplot.
  • Understand the basic types of plots and when to use them.
  • Create beautiful visualisations using ggplot2.
  • Use geographical data to produce choropleth maps.

1.2.4 Course 4 - R for Health Economics

By the end of the one day short course, the attendee should also be able to:

  • Understand the strengths and limitations of R for health economic modelling.
  • Manage different objects and parameters in R.
  • Use loops, custom functions and the apply family.
  • Create a markov model from scratch given known parameters.
  • Create a microsimulation model to incorporate hetrogeneity between groups.
  • Understand the importance of tranparency of coding. In particular commenting.

1.2.5 Course 5 - R Shiny for decision modelling

By the end of the half day short course, the attendee should be able to:

  • Understand the benefits and limitations of R-Shiny.
  • Have a basic understanding of the principles behind R-Shiny.
  • Create an R-Shiny application from scratch.
  • Integrate beautiful plots into R-Shiny.
  • Develop a user interface for an existing markov model in R-Shiny.
  • Know where to find further information.

1.2.6 Course 6 - Collaboration in R

By the end of the half day short course, the attendee should be able to:

  • Understand the strengths and limitations of R-Markdown.
  • Create replicatable HTML, Word and PDF documents using R-Markdown.
  • Include chunks of code, graphs, references and biblographies, links to websites and pictures within documents.
  • Collaborate on a project in GitHub
  • Replicate analysis for new or updated datasets from previous work made publicly available on GitHub.

1.3 Aim of the series of courses

This book, originally a series of short courses, is designed to equip the participant with a basic set of tools to undertake research using R. The aim is to create a strong foundation on which participants can build skills and knowledge specific to their research and consultancy objectives. At the time of writing R is the dominant programming language in health research, but other languages such as Python are more popular in business and finance environments. There is much debate about the relative benefits of different languages, however from our experience the most important thing is not the particular language used but the approach to solving problems. (If you intend to work in health research we still advise learning R).

We assume no prior experience in programming prior to reading this text. Our introductory course, which makes up the Intro to R chapter covers the basics - those familiar with R or who have read Wickham & Grolemund’s popular R for Data Science can skip this chapter. We then move on to more advanced topics in the Intermediate R and Beautiful Vizualisations chapters. The next two chapters focus on specific methods in health economic evaluation - the use of markov models microsimulation. We then turn to the use of user interfaces (R-Shiny) to improve the transparency of these models, before providing a tutorial in the use of Git and GitHub to aid collaboration and make research more transparent.

Requirements: It is assumed that all participants on the course have their own laptop, and have previously used software such as Excel or SPSS. Some basic understanding of statistics and mathematics is required (e.g. mean, median, minimum, maximum).