RMRWR
1
Preface
1.1
Who This Book is For
1.2
Prerequisites
1.3
The (Upward) Spiral of Success Structure
1.4
Motivation for this Book
1.5
The Scientific Reproducibility Crisis
1.6
Features of a Bookdown electronic book
1.7
What this Book is Not
1.7.1
This Book is Not A Statistics Text
1.7.2
This Book Does Not Provide Comprehensive Coverage of the R Universe
1.8
Some Guideposts
1.9
Helpful Tools
1.9.1
Demonstrations in Flipbooks
1.9.2
Learnr Coding Exercises
1.9.3
Coding
2
Getting Started and Installing Your Tools
2.1
Goals for this Chapter
2.2
Website links needed for this Chapter
2.3
Pathway for this Chapter
2.4
Installing R on your Computer
2.5
Windows-Specific Steps for Installing R
2.5.1
Testing R on Windows
2.6
Mac-specific Installation of R
2.6.1
Testing R on the Mac
2.6.2
Successful testing!
2.7
Installing RStudio on your Computer
2.7.1
Windows Install of RStudio
2.7.2
Testing Windows RStudio
2.7.3
Installing RStudio on the Mac
2.7.4
Testing the Mac Installation of RStudio
2.7.5
Critical Setup - Tuning Up Your RStudio Installation
2.8
Installing Git on your Computer
2.8.1
Installing Git on macOS
2.8.2
Installing Git on Windows
2.8.3
Installing Git on Linux
2.9
Getting Acquainted with the RStudio IDE
3
A Tasting Menu of R
3.1
Setting the Table
3.2
Goals for this Chapter
3.3
Packages needed for this Chapter
3.4
Website links needed for this Chapter
3.5
Setting up RPubs
3.6
Open a New Rmarkdown document
3.7
Knitting your Rmarkdown document
3.7.1
Installing Packages
3.7.2
Loading Packages with library()
3.8
Your Turn to Write Text
3.9
Wrangle Your Data
3.10
Summarize Your Data
3.11
Visualize Your Data
3.12
Statistical Testing of Differences
3.13
Publish your work to RPubs
3.14
The Dessert Cart
3.14.1
Interactive Plots
3.14.2
Animated Graphics
3.14.3
A Clinical Trial Dashboard
3.14.4
A Shiny App
3.14.5
An Example of Synergy in the R Community
4
Introduction to Reproducibility
4.1
First Steps to Research Reproducibility
4.1.1
Have a Plan
4.1.2
Treat Your Raw Data Like Gold
4.1.3
Cleaning and Analyzing Your Data
4.1.4
The First Level of Reproducibility
4.1.5
The Second Level of Reproducibility
5
Importing Your Data into R
5.1
Reading data with the {readr} package
5.1.1
Test yourself on scurvy
5.1.2
What is a path?
5.1.3
Try it Yourself
5.2
Reading Excel Files with readxl
5.2.1
Test yourself on read_excel()
5.3
Bringing in data from other Statistical Programs (SAS, Stata, SPSS) with the {haven} package
5.4
Other strange file types with rio
5.5
Data exploration with glimpse, str, and head/tail
5.5.1
Taking a glimpse with
glimpse()
5.5.2
Try this out yourself.
5.5.3
Test yourself on strep_tb
5.5.4
Examining Structure with
str()
5.5.5
Test yourself on the scurvy dataset
5.5.6
Examining a bit of data with
head()
and
tail()
5.5.7
Test yourself on the printing tibbles
5.6
More exploration with skimr and DataExplorer
5.6.1
Test yourself on the
skim()
results
5.6.2
Test yourself on the
create_report()
results
5.7
Practice loading data from multiple file types
5.8
Practice saving (writing to disk) data objects in formats including csv, rds, xls, xlsx and statistical program formats
5.9
How do readr and readxl parse columns?
5.10
What are the variable types?
5.11
Controlling Parsing
5.12
Chapter Challenges
5.13
Future forms of data ingestion
6
Wrangling Rows in R with Filter
6.1
Goals for this Chapter
6.2
Packages needed for this Chapter
6.3
Pathway for this Chapter
6.4
Logical Statements in R
6.5
Filtering on Numbers - Starting with A Flipbook
6.5.1
Your Turn - learnr exercises
6.6
Filtering on Multiple Criteria with Boolean Logic
6.6.1
Your Turn - learnr exercises
6.7
Filtering Strings
6.7.1
Your Turn - learnr exercises
6.8
Filtering Dates
6.8.1
Your Turn - learnr exercises
6.9
Filtering Out or Identifying Missing Data
6.9.1
Working with Missing data
6.9.2
Your Turn - learnr exercises
6.10
Filtering Out Duplicate observations
6.11
Slicing Data by Row
6.12
Randomly Sampling Your Rows
6.12.1
Your Turn - learnr exercises
6.13
Further Challenges
6.14
Explore More about Filtering
7
Wrangling Columns in R with Select, Rename, and Relocate
7.1
Goals for this Chapter
7.2
Packages needed for this Chapter
7.3
Pathway for this Chapter
7.4
Tidyselect Helpers in R
7.5
Selecting a Column Variables
7.5.1
Try this out
7.6
Selecting Columns that are Not Contiguous
7.7
Selecting Columns With Logical Operators
7.8
Further Challenges
7.9
Explore More about Filtering
8
Using Mutate to Make New Variables (Columns)
8.1
Calculating BMI
8.2
Recoding categorical or ordinal data
8.3
Calculating Glomerular Filtration Rate
9
Mutating Joins to Combine Data Sources
9.1
What are Joins?
9.2
What are Mutating Joins?
9.3
Let’s Start with Left Joins
9.4
Left Join in Action
9.5
Left Join in Practice
9.6
Quick Quiz
9.7
Problem variable names
9.8
Right Join in Action
9.9
Right Join in Practice
9.10
Inner Joins
9.11
Quick Quiz
9.12
Now Let’s take a Look at the result
9.13
Full Joins
9.14
Quick Quiz
9.15
Now Let’s take a Look at the result
10
Interpreting Error Messages
10.1
The Common Errors Table
10.2
Examples of Common Errors and How to fix them
10.2.1
Missing Parenthesis
10.2.2
An Extra Parenthesis
10.2.3
Missing pipe
%>%
in a data wrangling pipeline
10.2.4
Missing + in a ggplot pipeline
10.2.5
Pipe
%>%
in Place of a
+
10.2.6
Missing Comma Within a Function()
10.2.7
A Missing Object
10.2.8
One Equals Sign When you Need Two
10.2.9
Non-numeric argument to a binary operator
10.3
Errors Beyond This List
10.4
When Things Get Weird
10.4.1
Restart your R Session (Shift-Cmd-F10)
10.5
References:
11
The Building Blocks of R: data types, data structures, functions, and packages.
11.1
Data Types
11.2
Data Structures
11.3
Examining Data Types and Data Structures
11.4
Functions
11.5
Packages
11.6
The Building Blocks of R
12
Tips for Hashtag Debugging your Pipes and GGPlots
12.1
Debugging
12.2
The Quick Screen
12.3
Systematic Hunting For Bugs in Pipes
12.4
Systematic Hunting For Bugs in Plots
12.5
Hashtag Debugging
12.6
Pipe 2
12.7
Plot 2
12.8
Plot3
12.9
Pipe 3
13
Finding Help in R
13.1
Programming in R
13.2
Starting with Help!
13.3
The Magic of Vignettes
13.4
Googling the Error Message
13.5
You Know What You Want to Do, but Don’t Know What Package or Function to Use
13.5.1
CRAN Task Views
13.5.2
Google is Your Friend
13.6
Seeking Advanced Help with a Minimal REPREX
14
The Basics of Base R
14.1
Dimensions of Data Rectangles
14.2
Naming columns
14.3
Concatenation
14.4
Sequences
14.5
Constants
14.6
Fancier Sequences
14.7
Mathematical functions
14.8
Handling missing data (NAs)
14.9
Cutting Continuous data into Levels
15
Updating R, RStudio, and Your Packages
15.1
Installing Packages
15.1.1
Installing Packages from Github
15.1.2
Problems with Installing Packages
15.2
Loading Packages with Library
15.3
Updating R
15.4
Updating RStudio
15.5
Updating Your Packages
16
Major R Updates (Where Are My Packages?)
16.1
Preparing for a Minor or Major R Upgrade
16.1.1
Installing New Packages from Scratch
16.2
Rebuilding All Packages in One (Automated) Step
16.3
Checking the new library path
16.4
Now Check your list of Packages
16.5
Updating Packages
17
Intermediate Steps Toward Reproducibility
17.1
Level 3 Reproducibility
17.1.1
Creating a New Project in RStudio
17.1.2
File paths and the {here} package
17.2
Code Review with a Coding Partner
17.2.1
Checklist for Code Review
17.3
Sharing code on GitHub
18
Building Table One for a Clinical Study
18.1
Packages Needed for this Chapter:
18.2
Pathway for this Chapter
18.3
Baseline Characteristics
18.4
Building Your Table 1
18.4.1
Updating Variable Labels
18.4.2
Updating Variable Values
18.4.3
Table 1 separated by Treatment Arm
18.4.4
Styling our Table 1
18.4.5
Adding A Column Spanner
18.4.6
Further Styling our Table 1
18.4.7
Your Turn
18.5
Try this with a new dataset
18.6
Making Modifications to the trial table
18.7
More Modifications to the trial table
18.8
Taking Control of the Stats
18.8.1
Your Turn
19
Comparing Two Measures of Centrality
19.1
Common Problem
19.1.1
How Skewed is Too Skewed?
19.1.2
Visualize the Distribution of data variables in ggplot
19.1.3
Visualize the Distribution of data$len in ggplot
19.1.4
Results of Shapiro-Wilk
19.1.5
Try it yourself
19.1.6
Mammal sleep hours
19.2
One Sample T test
19.2.1
How to do One Sample T test
19.2.2
Interpreting the One Sample T test
19.2.3
What are the arguments of the t.test function?
19.3
Insert flipbook for ttest here
19.3.1
Flipbook Time!
19.4
Fine, but what about 2 groups?
19.4.1
Setting up 2 group t test
19.4.2
Results of the 2 group t test
19.4.3
Interpreting the 2 group t test
19.4.4
2 group t test with wide data
19.4.5
Results of 2 group t test with wide data
19.5
3 Assumptions of Student’s t test
19.5.1
Testing Assumptions of Student’s t test
19.6
Getting results out of t.test
19.6.1
Getting results out of t.test
19.7
Reporting the results from t.test using inline code
19.7.1
For Next Time
20
Sample Size Calculations with
{pwr}
20.1
Sample Size for a Continuous Endpoint (t-test)
20.2
One Sample t-test for Lowering Creatinine
20.3
Paired t-tests (before vs after, or truly paired)
20.4
2 Sample t tests with Unequal Study Arm Sizes
20.5
Testing Multiple Options and Plotting Results
20.6
Your Turn
20.6.1
Scenario 1: FEV1 in COPD
20.6.2
Scenario 2: BNP in CHF
20.6.3
Scenario 3: Barthel Index in Stroke
20.7
Sample Sizes for Proportions
20.8
Sample size for two proportions, equal n
20.9
Sample size for two proportions, unequal arms
20.10
Your Turn
20.10.1
Scenario 1: Mortality on Renal Dialysis
20.10.2
Scenario 2: Intestinal anastomosis in Crohn’s disease
20.10.3
Scenario 3: Metformin in Donuts
20.11
add chi square
20.12
add correlation test
20.13
add anova
20.14
add linear model
20.15
add note on guessing effect sizes - cohen small, medium, large
20.16
Explore More
21
Randomization for Clinical Trials with R
21.1
Printing these on Cards
21.2
Now, try this yourself
21.3
Now Freestyle
22
Univariate ggplots to Visualize Distributions
22.1
Histograms
22.1.1
Comparisons of Distributions with Histograms
22.1.2
Histograms and Categories
22.2
Density Plots
22.2.1
Comparisons with Density plots
22.3
Comparing Distributions Across Categories
22.4
Boxplots
22.5
Violin Plots
22.6
Ridgeline Plots
22.6.1
Including Plots
22.6.2
Including Points
22.6.3
Including Points
22.6.4
Including Points
22.6.5
Including Points
23
Bivariate ggplot2 Scatterplots to Visualize Relationships Between Variables
23.1
Packages used in this Chapter
23.2
Data Exploration and Validation (DEV)
23.3
Scatterplots
23.3.1
Micro-quiz!
23.4
Mapping More Variables
23.5
Inheritance and Layering in ggplot2
23.6
Aesthetic mapping Micro-Quiz!
23.7
Controlling Point Shape, Size, and Color Manually
23.7.1
Manual Shapes
23.7.2
Manual Sizes
23.7.3
Manual Color
24
Extensions to ggplot
24.1
Goals for this Chapter
24.2
Packages Needed for this chapter
24.3
A Flipbook of Where We Are Going With ggplot Extensions
24.3.1
MAKE FLIPBOOK
24.4
A Waffle Plot
24.5
An Alluvial Plot
24.6
Lollipop Plots
24.7
Dumbbell Plots
24.8
Spaghetti Plots with Summary Smoothed Lines for Change Over Time
24.9
Swimmer Plots
24.10
Adding Significance Comparisons with {ggsignif}
25
Customizing Plot Scales
25.1
Goals for this Chapter
25.2
Packages Needed for this chapter
25.3
A Flipbook of Where We Are Going With Scales
25.4
A Basic Scatterplot
25.5
But what if you want the scale for risk to start at 0?
25.6
But this axis does not really start at Exactly 0
25.7
Control the Limits and the Breaks
25.8
Test what you have learned
25.9
Continuous vs. Discrete Plots and Scales
25.10
Using Scales to Customize a Legend
25.11
Test what you have learned
25.11.1
More Examples with Flipbooks
26
Helping out with ggplot
26.1
ggx::gghelp()
26.2
Getting more help with theming with ggThemeAssist
26.3
Website helpers for ggplot
26.4
Getting Even more help with esquisse
27
Functions
27.1
Don’t repeat yourself
27.2
Your Turn
27.3
Freestyle
27.3.1
Acknowledgement
27.4
Read More
28
Using Found (Web) Data
28.1
Found Poetry
28.2
Found Data
28.3
Download Example
28.4
Datapasta (small table) Example
28.5
Your Turn
28.6
{rvest} Example
28.7
Your Turn
28.8
API example with {tidycensus}
28.9
Challenges
28.10
Advanced Challenge - Dynamic Websites
29
Linear Regression and Broom for Tidying Models
29.1
Packages needed
29.2
Building a simple base model with {lm}
29.2.1
Producing manuscript-quality tables with {gtsummary}
29.3
Is Your Model Valid?
29.4
Making Predictions with Your Model
29.4.1
Predictions from new data
29.5
Choosing predictors for multivariate modeling – testing, dealing with collinearity
29.5.1
Challenges
29.6
presenting model results with RMarkdown
29.6.1
Challenges
29.7
presenting model results with a Shiny App
29.7.1
Challenges
30
Logistic Regression and Broom for Tidying Models
30.1
The Model Summary
30.2
Evaluating your Model Assumptions
30.3
Converting between logit, odds ratios, and probability
31
Fast and Frugal Trees with the {FFTrees} Package
31.1
Setup
31.2
The Breast Cancer Dataset
31.2.1
Data Inspection
31.2.2
Check Your Progress
31.3
Building a FFTrees Model for Breast Cancer
31.4
Your Turn with Heart Disease Data
31.4.1
Test what you have learned
31.5
Your Turn to Build and Interpret a Model
31.6
Now build your FFTrees model to predict improved status (vs. death)
32
A Gentle Introduction to Shiny
32.1
What is Shiny?
32.2
The Basic Structure of a Shiny App
32.2.1
The weirdness of a Shiny app
32.3
The User Interface Section Structure
32.4
The Server Section Structure
32.5
How to Run an App
32.5.1
How to Stop an App
32.6
Building a Very Simple App (Version 1)
32.6.1
The ui section
32.6.2
The server section
32.7
Edit this App (Version 2)
32.8
Building a User Interface for Inputs and Outputs
32.8.1
Inputs
32.8.2
Outputs
32.9
Building a Functioning Server Section
32.9.1
Using the input values & Data
32.9.2
Wrangling and Calculating
32.9.3
Rendering to HTML Outputs
32.10
Building a Simple Shiny App (Version 3)
32.11
Publishing Your Shiny App on the Web
32.12
More to Explore
33
Sharing Models with Shiny
33.0.1
Packages Needed for this Chapter
33.1
Setting up and Saving Models
33.1.1
Linear Model
33.1.2
Logistic Model
33.1.3
Random Forest Model
33.2
Building a Shiny App for the Linear Model
33.2.1
The Default Shiny App
33.2.2
Editing the
ui
sidebarPanel
for the Input Predictor Variables
33.2.3
Editing the
server
section to make Predictions
33.2.4
Editing the mainPanel in the ui section to display your Prediction
33.3
Building a Shiny App for the Logistic Model
33.3.1
The Default Shiny App
33.3.2
Editing the
ui
sidebarPanel
for the Input Predictor Variables
33.3.3
Editing the
server
section to make Predictions
33.3.4
Editing the mainPanel in the ui section to display your Prediction
33.4
Building a Shiny App for the Random Forest Model
33.5
Challenge Yourself
34
Introduction to R Markdown
34.1
What Makes an Rmarkdown document?
34.2
Trying out RMarkdown with a Mock Manuscript
34.3
Inserting Code Chunks
34.3.1
Code Chunk Icons
34.4
Including Plots
34.5
Including Tables
34.6
Including Links and Images
34.6.1
Links
34.6.2
Images
34.7
Other languages in code chunks
34.8
Code Chunk Options
34.9
How It All (Rmarkdown + {knitr} + Pandoc) Works
34.10
Knitting and Editing (and re-Knitting() Your Rmd document
34.11
Try Out Other Chunk Options
34.12
The
setup
chunk
34.13
Markdown syntax
34.14
2nd Header
34.14.1
3rd Header
34.15
Line Breaks and Page Breaks
34.16
Making Lists
34.16.1
Ordered Lists
34.16.2
Un-ordered lists
34.16.3
Nested Lists
34.17
The Easy Button - Visual Markdown Editing
34.17.1
Try inserting a list, a table and a block-quote
34.18
Inline Code
34.18.1
Try inserting some in-line R code
34.19
A Quick Quiz
35
Rmarkdown Output Options
35.1
Microsoft Word Output from Rmarkdown
35.1.1
Making a Styles Reference File for Microsoft Word
35.1.2
Let’s Practice This.
35.1.3
Re-formatting Your Template
35.1.4
Using Your New Styles Template
35.1.5
Now you are ready!
35.2
PDF Output from RMarkdown
35.2.1
LaTeX and tinytex
35.2.2
Knitting to PDF
35.3
Microsoft Powerpoint Output from Rmarkdown
35.3.1
Tables in Powerpoint
35.3.2
Images in Powerpoint
35.3.3
Plots in Powerpoint
36
Adding Citations to your RMarkdown
37
Quarto is a Next-Generation RMarkdown
37.1
Goals for this Chapter
37.2
Packages Needed for this chapter
37.3
Introducing Quarto
37.4
A Tour of Quarto
37.5
Opening a New Quarto Document
37.6
Annotating code in Quarto
37.7
The Visual Editor vs. Source Editor in Quarto
37.8
Adding Code Chunks
37.9
Organized Options in Code Chunks with the Hash-Pipe #|
37.10
Stating Global Options in Your YAML Header
37.10.1
Code Options and Code Folding
37.10.2
Parameters
37.11
Figures
37.12
Tables
37.13
Inline Code and Caching
37.14
Quarto at the Command Line
37.15
Citations in Quarto
37.16
Challenge Yourself
37.17
Exploring further
38
Running R from the UNIX Command Line
38.1
What is the UNIX Command line?
38.2
Why run R from the command line?
38.3
How do you get started?
38.3.1
On a Mac
38.3.2
On a Windows PC
38.4
The Yawning Blackness of the Terminal Window
38.5
Where Are We?
38.6
Cleaning Up
38.7
Other helpful file commands
38.8
What about R?
38.9
What about just a few lines of R?
38.10
Running an R Script from the Terminal
38.11
Rendering an Rmarkdown file from the Terminal
Title holder
References
Published with bookdown
Reproducible Medical Research with R
Title holder