RMRWR
1
Preface
1.1
Who This Book is For
1.2
Prerequisites
1.3
The (Upward) Spiral of Success Structure
1.4
Motivation for this Book
1.5
The Scientific Reproducibility Crisis
1.6
Features of a Bookdown electronic book
1.7
What this Book is Not
1.7.1
This Book is Not A Statistics Text
1.7.2
This Book Does Not Provide Comprehensive Coverage of the R Universe
1.8
Some Guideposts
1.9
Helpful Tools
1.9.1
Demonstrations in Flipbooks
1.9.2
Learnr Coding Exercises
1.9.3
Coding
2
Getting Started and Installing Your Tools
2.1
Goals for this Chapter
2.2
Website links needed for this Chapter
2.3
Pathway for this Chapter
2.4
Installing R on your Computer
2.5
Windows-Specific Steps for Installing R
2.5.1
Testing R on Windows
2.6
Mac-specific Installation of R
2.6.1
Testing R on the Mac
2.6.2
Successful testing!
2.7
Installing RStudio on your Computer
2.7.1
Windows Install of RStudio
2.7.2
Testing Windows RStudio
2.7.3
Installing RStudio on the Mac
2.7.4
Testing the Mac Installation of RStudio
2.7.5
Critical Setup - Tuning Up Your RStudio Installation
2.8
Installing Git on your Computer
2.8.1
Installing Git on macOS
2.8.2
Installing Git on Windows
2.8.3
Installing Git on Linux
2.9
Getting Acquainted with the RStudio IDE
3
A Tasting Menu of R
3.1
Setting the Table
3.2
Goals for this Chapter
3.3
Packages needed for this Chapter
3.4
Website links needed for this Chapter
3.5
Setting up RPubs
3.6
Open a New Rmarkdown document
3.7
Knitting your Rmarkdown document
3.7.1
Installing Packages
3.7.2
Loading Packages with library()
3.8
Your Turn to Write Text
3.9
Wrangle Your Data
3.10
Summarize Your Data
3.11
Visualize Your Data
3.12
Statistical Testing of Differences
3.13
Publish your work to RPubs
3.14
The Dessert Cart
3.14.1
Interactive Plots
3.14.2
Animated Graphics
3.14.3
A Clinical Trial Dashboard
3.14.4
A Shiny App
3.14.5
An Example of Synergy in the R Community
4
Introduction to Reproducibility
4.1
First Steps to Research Reproducibility
4.1.1
Have a Plan
4.1.2
Treat Your Raw Data Like Gold
4.1.3
Cleaning and Analyzing Your Data
4.1.4
The First Level of Reproducibility
4.1.5
The Second Level of Reproducibility
5
Importing Your Data into R
5.1
Reading data with the {readr} package
5.1.1
Test yourself on scurvy
5.1.2
What is a path?
5.1.3
Try it Yourself
5.2
Reading Excel Files with readxl
5.2.1
Test yourself on read_excel()
5.3
Bringing in data from other Statistical Programs (SAS, Stata, SPSS) with the {haven} package
5.4
Other strange file types with rio
5.5
Data exploration with glimpse, str, and head/tail
5.5.1
Taking a glimpse with
glimpse()
5.5.2
Try this out yourself.
5.5.3
Test yourself on strep_tb
5.5.4
Examining Structure with
str()
5.5.5
Test yourself on the scurvy dataset
5.5.6
Examining a bit of data with
head()
and
tail()
5.5.7
Test yourself on the printing tibbles
5.6
More exploration with skimr and DataExplorer
5.6.1
Test yourself on the
skim()
results
5.6.2
Test yourself on the
create_report()
results
5.7
Practice loading data from multiple file types
5.8
Practice saving (writing to disk) data objects in formats including csv, rds, xls, xlsx and statistical program formats
5.9
How do readr and readxl parse columns?
5.10
What are the variable types?
5.11
Controlling Parsing
5.12
Chapter Challenges
5.13
Future forms of data ingestion
6
Wrangling Rows in R with Filter
6.1
Goals for this Chapter
6.2
Packages needed for this Chapter
6.3
Pathway for this Chapter
6.4
Logical Statements in R
6.5
Filtering on Numbers - Starting with A Flipbook
6.5.1
Your Turn - learnr exercises
6.6
Filtering on Multiple Criteria with Boolean Logic
6.6.1
Your Turn - learnr exercises
6.7
Filtering Strings
6.7.1
Your Turn - learnr exercises
6.8
Filtering Dates
6.8.1
Your Turn - learnr exercises
6.9
Filtering Out or Identifying Missing Data
6.9.1
Working with Missing data
6.9.2
Your Turn - learnr exercises
6.10
Filtering Out Duplicate observations
6.11
Slicing Data by Row
6.12
Randomly Sampling Your Rows
6.12.1
Your Turn - learnr exercises
6.13
Further Challenges
6.14
Explore More about Filtering
7
Wrangling Columns in R with Select, Rename, and Relocate
7.1
Goals for this Chapter
7.2
Packages needed for this Chapter
7.3
Pathway for this Chapter
7.4
Tidyselect Helpers in R
7.5
Selecting a Column Variables
7.5.1
Try this out
7.6
Selecting Columns that are Not Contiguous
7.7
Selecting Columns With Logical Operators
7.8
Further Challenges
7.9
Explore More about Filtering
8
Using Mutate to Make New Variables (Columns)
8.1
Calculating BMI
8.2
Recoding categorical or ordinal data
8.3
Calculating Glomerular Filtration Rate
9
Mutating Joins to Combine Data Sources
9.1
What are Joins?
9.2
What are Mutating Joins?
9.3
Let’s Start with Left Joins
9.4
Left Join in Action
9.5
Left Join in Practice
9.6
Quick Quiz
9.7
Problem variable names
9.8
Right Join in Action
9.9
Right Join in Practice
9.10
Inner Joins
9.11
Quick Quiz
9.12
Now Let’s take a Look at the result
9.13
Full Joins
9.14
Quick Quiz
9.15
Now Let’s take a Look at the result
10
Interpreting Error Messages
10.1
The Common Errors Table
10.2
Examples of Common Errors and How to fix them
10.2.1
Missing Parenthesis
10.2.2
An Extra Parenthesis
10.2.3
Missing pipe
%>%
in a data wrangling pipeline
10.2.4
Missing + in a ggplot pipeline
10.2.5
Pipe
%>%
in Place of a
+
10.2.6
Missing Comma Within a Function()
10.2.7
A Missing Object
10.2.8
One Equals Sign When you Need Two
10.2.9
Non-numeric argument to a binary operator
10.3
Errors Beyond This List
10.4
When Things Get Weird
10.4.1
Restart your R Session (Shift-Cmd-F10)
10.5
References:
11
The Building Blocks of R: data types, data structures, functions, and packages.
11.1
Data Types
11.2
Data Structures
11.3
Examining Data Types and Data Structures
11.4
Functions
11.5
Packages
11.6
The Building Blocks of R
12
Tips for Hashtag Debugging your Pipes and GGPlots
12.1
Debugging
12.2
The Quick Screen
12.3
Systematic Hunting For Bugs in Pipes
12.4
Systematic Hunting For Bugs in Plots
12.5
Hashtag Debugging
12.6
Pipe 2
12.7
Plot 2
12.8
Plot3
12.9
Pipe 3
13
Finding Help in R
13.1
Programming in R
13.2
Starting with Help!
13.3
The Magic of Vignettes
13.4
Googling the Error Message
13.5
You Know What You Want to Do, but Don’t Know What Package or Function to Use
13.5.1
CRAN Task Views
13.5.2
Google is Your Friend
13.6
Seeking Advanced Help with a Minimal REPREX
14
The Basics of Base R
14.1
Dimensions of Data Rectangles
14.2
Naming columns
14.3
Concatenation
14.4
Sequences
14.5
Constants
14.6
Fancier Sequences
14.7
Mathematical functions
14.8
Handling missing data (NAs)
14.9
Cutting Continuous data into Levels
15
Updating R, RStudio, and Your Packages
15.1
Installing Packages
15.1.1
Installing Packages from Github
15.1.2
Problems with Installing Packages
15.2
Loading Packages with Library
15.3
Updating R
15.4
Updating RStudio
15.5
Updating Your Packages
16
Major R Updates (Where Are My Packages?)
16.1
Preparing for a Minor or Major R Upgrade
16.1.1
Installing New Packages from Scratch
16.2
Rebuilding All Packages in One (Automated) Step
16.3
Checking the new library path
16.4
Now Check your list of Packages
16.5
Updating Packages
17
Intermediate Steps Toward Reproducibility
17.1
Level 3 Reproducibility
17.1.1
Creating a New Project in RStudio
17.1.2
File paths and the {here} package
17.2
Code Review with a Coding Partner
17.2.1
Checklist for Code Review
17.3
Sharing code on GitHub
18
Building Table One for a Clinical Study
18.1
Packages Needed for this Chapter:
18.2
Pathway for this Chapter
18.3
Baseline Characteristics
18.4
Building Your Table 1
18.4.1
Updating Variable Labels
18.4.2
Updating Variable Values
18.4.3
Table 1 separated by Treatment Arm
18.4.4
Styling our Table 1
18.4.5
Adding A Column Spanner
18.4.6
Further Styling our Table 1
18.4.7
Your Turn
18.5
Try this with a new dataset
18.6
Making Modifications to the trial table
18.7
More Modifications to the trial table
18.8
Taking Control of the Stats
18.8.1
Your Turn
19
Comparing Two Measures of Centrality
19.1
Common Problem
19.1.1
How Skewed is Too Skewed?
19.1.2
Visualize the Distribution of data variables in ggplot
19.1.3
Visualize the Distribution of data$len in ggplot
19.1.4
Results of Shapiro-Wilk
19.1.5
Try it yourself
19.1.6
Mammal sleep hours
19.2
One Sample T test
19.2.1
How to do One Sample T test
19.2.2
Interpreting the One Sample T test
19.2.3
What are the arguments of the t.test function?
19.3
Insert flipbook for ttest here
19.3.1
Flipbook Time!
19.4
Fine, but what about 2 groups?
19.4.1
Setting up 2 group t test
19.4.2
Results of the 2 group t test
19.4.3
Interpreting the 2 group t test
19.4.4
2 group t test with wide data
19.4.5
Results of 2 group t test with wide data
19.5
3 Assumptions of Student’s t test
19.5.1
Testing Assumptions of Student’s t test
19.6
Getting results out of t.test
19.6.1
Getting results out of t.test
19.7
Reporting the results from t.test using inline code
19.7.1
For Next Time
20
Sample Size Calculations with
{pwr}
20.1
Sample Size for a Continuous Endpoint (t-test)
20.2
One Sample t-test for Lowering Creatinine
20.3
Paired t-tests (before vs after, or truly paired)
20.4
2 Sample t tests with Unequal Study Arm Sizes
20.5
Testing Multiple Options and Plotting Results
20.6
Your Turn
20.6.1
Scenario 1: FEV1 in COPD
20.6.2
Scenario 2: BNP in CHF
20.6.3
Scenario 3: Barthel Index in Stroke
20.7
Sample Sizes for Proportions
20.8
Sample size for two proportions, equal n
20.9
Sample size for two proportions, unequal arms
20.10
Your Turn
20.10.1
Scenario 1: Mortality on Renal Dialysis
20.10.2
Scenario 2: Intestinal anastomosis in Crohn’s disease
20.10.3
Scenario 3: Metformin in Donuts
20.11
add chi square
20.12
add correlation test
20.13
add anova
20.14
add linear model
20.15
add note on guessing effect sizes - cohen small, medium, large
20.16
Explore More
21
Randomization for Clinical Trials with R
21.1
Printing these on Cards
21.2
Now, try this yourself
21.3
Now Freestyle
22
Univariate ggplots to Visualize Distributions
22.1
Histograms
22.1.1
Comparisons of Distributions with Histograms
22.1.2
Histograms and Categories
22.2
Density Plots
22.2.1
Comparisons with Density plots
22.3
Comparing Distributions Across Categories
22.4
Boxplots
22.5
Violin Plots
22.6
Ridgeline Plots
22.6.1
Including Plots
22.6.2
Including Points
22.6.3
Including Points
22.6.4
Including Points
22.6.5
Including Points
23
Bivariate ggplot2 Scatterplots to Visualize Relationships Between Variables
23.1
Packages used in this Chapter
23.2
Data Exploration and Validation (DEV)
23.3
Scatterplots
23.3.1
Micro-quiz!
23.4
Mapping More Variables
23.5
Inheritance and Layering in ggplot2
23.6
Aesthetic mapping Micro-Quiz!
23.7
Controlling Point Shape, Size, and Color Manually
23.7.1
Manual Shapes
23.7.2
Manual Sizes
23.7.3
Manual Color
24
Extensions to ggplot
24.1
Goals for this Chapter
24.2
Packages Needed for this chapter
24.3
A Flipbook of Where We Are Going With ggplot Extensions
24.3.1
MAKE FLIPBOOK
24.4
A Waffle Plot
24.5
An Alluvial Plot
24.6
Lollipop Plots
24.7
Dumbbell Plots
24.8
Spaghetti Plots with Summary Smoothed Lines for Change Over Time
24.9
Swimmer Plots
24.10
Adding Significance Comparisons with {ggsignif}
25
Customizing Plot Scales
25.1
Goals for this Chapter
25.2
Packages Needed for this chapter
25.3
A Flipbook of Where We Are Going With Scales
25.4
A Basic Scatterplot
25.5
But what if you want the scale for risk to start at 0?
25.6
But this axis does not really start at Exactly 0
25.7
Control the Limits and the Breaks
25.8
Test what you have learned
25.9
Continuous vs. Discrete Plots and Scales
25.10
Using Scales to Customize a Legend
25.11
Test what you have learned
25.11.1
More Examples with Flipbooks
26
Helping out with ggplot
26.1
ggx::gghelp()
26.2
Getting more help with theming with ggThemeAssist
26.3
Website helpers for ggplot
26.4
Getting Even more help with esquisse
27
Functions
27.1
Don’t repeat yourself
27.2
Your Turn
27.3
Freestyle
27.3.1
Acknowledgement
27.4
Read More
28
Linear Regression and Broom for Tidying Models
28.1
Packages needed
28.2
Building a simple base model with {lm}
28.2.1
Producing manuscript-quality tables with {gtsummary}
28.3
Is Your Model Valid?
28.4
Making Predictions with Your Model
28.4.1
Predictions from new data
28.5
Choosing predictors for multivariate modeling – testing, dealing with collinearity
28.5.1
Challenges
28.6
presenting model results with RMarkdown
28.6.1
Challenges
28.7
presenting model results with a Shiny App
28.7.1
Challenges
29
Logistic Regression and Broom for Tidying Models
29.1
The Model Summary
29.2
Evaluating your Model Assumptions
29.3
Converting between logit, odds ratios, and probability
30
Fast and Frugal Trees with the {FFTrees} Package
30.1
Setup
30.2
The Breast Cancer Dataset
30.2.1
Data Inspection
30.2.2
Check Your Progress
30.3
Building a FFTrees Model for Breast Cancer
30.4
Your Turn with Heart Disease Data
30.4.1
Test what you have learned
30.5
Your Turn to Build and Interpret a Model
30.6
Now build your FFTrees model to predict improved status (vs. death)
31
A Gentle Introduction to Shiny
31.1
What is Shiny?
31.2
The Basic Structure of a Shiny App
31.2.1
The weirdness of a Shiny app
31.3
The User Interface Section Structure
31.4
The Server Section Structure
31.5
How to Run an App
31.5.1
How to Stop an App
31.6
Building a Very Simple App (Version 1)
31.6.1
The ui section
31.6.2
The server section
31.7
Edit this App (Version 2)
31.8
Building a User Interface for Inputs and Outputs
31.8.1
Inputs
31.8.2
Outputs
31.9
Building a Functioning Server Section
31.9.1
Using the input values & Data
31.9.2
Wrangling and Calculating
31.9.3
Rendering to HTML Outputs
31.10
Building a Simple Shiny App (Version 3)
31.11
Publishing Your Shiny App on the Web
31.12
More to Explore
32
Sharing Models with Shiny
32.0.1
Packages Needed for this Chapter
32.1
Setting up and Saving Models
32.1.1
Linear Model
32.1.2
Logistic Model
32.1.3
Random Forest Model
32.2
Building a Shiny App for the Linear Model
32.2.1
The Default Shiny App
32.2.2
Editing the
ui
sidebarPanel
for the Input Predictor Variables
32.2.3
Editing the
server
section to make Predictions
32.2.4
Editing the mainPanel in the ui section to display your Prediction
32.3
Building a Shiny App for the Logistic Model
32.3.1
The Default Shiny App
32.3.2
Editing the
ui
sidebarPanel
for the Input Predictor Variables
32.3.3
Editing the
server
section to make Predictions
32.3.4
Editing the mainPanel in the ui section to display your Prediction
32.4
Building a Shiny App for the Random Forest Model
32.5
Challenge Yourself
33
Introduction to R Markdown
33.1
What Makes an Rmarkdown document?
33.2
Trying out RMarkdown with a Mock Manuscript
33.3
Inserting Code Chunks
33.3.1
Code Chunk Icons
33.4
Including Plots
33.5
Including Tables
33.6
Including Links and Images
33.6.1
Links
33.6.2
Images
33.7
Other languages in code chunks
33.8
Code Chunk Options
33.9
How It All (Rmarkdown + {knitr} + Pandoc) Works
33.10
Knitting and Editing (and re-Knitting() Your Rmd document
33.11
Try Out Other Chunk Options
33.12
The
setup
chunk
33.13
Markdown syntax
33.14
2nd Header
33.14.1
3rd Header
33.15
Line Breaks and Page Breaks
33.16
Making Lists
33.16.1
Ordered Lists
33.16.2
Un-ordered lists
33.16.3
Nested Lists
33.17
The Easy Button - Visual Markdown Editing
33.17.1
Try inserting a list, a table and a block-quote
33.18
Inline Code
33.18.1
Try inserting some in-line R code
33.19
A Quick Quiz
34
Rmarkdown Output Options
34.1
Microsoft Word Output from Rmarkdown
34.1.1
Making a Styles Reference File for Microsoft Word
34.1.2
Let’s Practice This.
34.1.3
Re-formatting Your Template
34.1.4
Using Your New Styles Template
34.1.5
Now you are ready!
34.2
PDF Output from RMarkdown
34.2.1
LaTeX and tinytex
34.2.2
Knitting to PDF
34.3
Microsoft Powerpoint Output from Rmarkdown
34.3.1
Tables in Powerpoint
34.3.2
Images in Powerpoint
34.3.3
Plots in Powerpoint
35
Adding Citations to your RMarkdown
36
Quarto is a Next-Generation RMarkdown
36.1
Goals for this Chapter
36.2
Packages Needed for this chapter
36.3
Introducing Quarto
36.4
A Tour of Quarto
36.5
Opening a New Quarto Document
36.6
Annotating code in Quarto
36.7
The Visual Editor vs. Source Editor in Quarto
36.8
Adding Code Chunks
36.9
Organized Options in Code Chunks with the Hash-Pipe #|
36.10
Stating Global Options in Your YAML Header
36.10.1
Code Options and Code Folding
36.10.2
Parameters
36.11
Figures
36.12
Tables
36.13
Inline Code and Caching
36.14
Quarto at the Command Line
36.15
Citations in Quarto
36.16
Challenge Yourself
36.17
Exploring further
37
Running R from the UNIX Command Line
37.1
What is the UNIX Command line?
37.2
Why run R from the command line?
37.3
How do you get started?
37.3.1
On a Mac
37.3.2
On a Windows PC
37.4
The Yawning Blackness of the Terminal Window
37.5
Where Are We?
37.6
Cleaning Up
37.7
Other helpful file commands
37.8
What about R?
37.9
What about just a few lines of R?
37.10
Running an R Script from the Terminal
37.11
Rendering an Rmarkdown file from the Terminal
Title holder
References
Published with bookdown
Reproducible Medical Research with R
References