Type to search
RMRWR
1
Preface
1.1
Who This Book is For
1.2
Prerequisites
1.3
The (Upward) Spiral of Success Structure
1.4
Motivation for this Book
1.5
The Scientific Reproducibility Crisis
1.6
Features of a Bookdown electronic book
1.7
What this Book is Not
1.7.1
This Book is Not A Statistics Text
1.7.2
This Book Does Not Provide Comprehensive Coverage of the R Universe
1.8
Some Guideposts
1.9
Helpful Tools
1.9.1
Demonstrations in Flipbooks
1.9.2
Learnr Coding Exercises
1.9.3
Coding
2
Getting Started and Installing Your Tools
2.1
Goals for this Chapter
2.2
Website links needed for this Chapter
2.3
Pathway for this Chapter
2.4
Installing R on your Computer
2.5
Windows-Specific Steps for Installing R
2.5.1
Testing R on Windows
2.6
Mac-specific Installation of R
2.6.1
Testing R on the Mac
2.6.2
Successful testing!
2.7
Installing RStudio on your Computer
2.7.1
Windows Install of RStudio
2.7.2
Testing Windows RStudio
2.7.3
Installing RStudio on the Mac
2.7.4
Testing the Mac Installation of RStudio
2.7.5
Critical Setup - Tuning Up Your RStudio Installation
2.8
Installing Git on your Computer
2.8.1
Installing Git on macOS
2.8.2
Installing Git on Windows
2.8.3
Installing Git on Linux
2.9
Getting Acquainted with the RStudio IDE
3
A Tasting Menu of R
3.1
Setting the Table
3.2
Goals for this Chapter
3.3
Packages needed for this Chapter
3.4
Website links needed for this Chapter
3.5
Setting up RPubs
3.6
Open a New Rmarkdown document
3.7
Knitting your Rmarkdown document
3.7.1
Installing Packages
3.7.2
Loading Packages with library()
3.8
Your Turn to Write Text
3.9
Wrangle Your Data
3.10
Summarize Your Data
3.11
Visualize Your Data
3.12
Statistical Testing of Differences
3.13
Publish your work to RPubs
3.14
The Dessert Cart
3.14.1
Interactive Plots
3.14.2
Animated Graphics
3.14.3
A Clinical Trial Dashboard
3.14.4
A Shiny App
3.14.5
An Example of Synergy in the R Community
4
Introduction to Reproducibility
4.1
First Steps to Research Reproducibility
4.1.1
Have a Plan
4.1.2
Treat Your Raw Data Like Gold
4.1.3
Cleaning and Analyzing Your Data
4.1.4
The First Level of Reproducibility
4.1.5
The Second Level of Reproducibility
5
Importing Your Data into R
5.1
Reading data with the {readr} package
5.1.1
Test yourself on scurvy
5.1.2
What is a path?
5.1.3
Try it Yourself
5.2
Reading Excel Files with readxl
5.2.1
Test yourself on read_excel()
5.3
Bringing in data from other Statistical Programs (SAS, Stata, SPSS) with the {haven} package
5.4
Other strange file types with rio
5.5
Data exploration with glimpse, str, and head/tail
5.5.1
Taking a glimpse with
glimpse()
5.5.2
Try this out yourself.
5.5.3
Test yourself on strep_tb
5.5.4
Examining Structure with
str()
5.5.5
Test yourself on the scurvy dataset
5.5.6
Examining a bit of data with
head()
and
tail()
5.5.7
Test yourself on the printing tibbles
5.6
More exploration with skimr and DataExplorer
5.6.1
Test yourself on the
skim()
results
5.6.2
Test yourself on the
create_report()
results
5.7
Practice loading data from multiple file types
5.8
Practice saving (writing to disk) data objects in formats including csv, rds, xls, xlsx and statistical program formats
5.9
How do readr and readxl parse columns?
5.10
What are the variable types?
5.11
Controlling Parsing
5.12
Chapter Challenges
5.13
Future forms of data ingestion
6
Wrangling Rows in R with Filter
6.1
Goals for this Chapter
6.2
Packages needed for this Chapter
6.3
Pathway for this Chapter
6.4
Logical Statements in R
6.5
Filtering on Numbers - Starting with A Flipbook
6.5.1
Your Turn - learnr exercises
6.6
Filtering on Multiple Criteria with Boolean Logic
6.6.1
Your Turn - learnr exercises
6.7
Filtering Strings
6.7.1
Your Turn - learnr exercises
6.8
Filtering Dates
6.8.1
Your Turn - learnr exercises
6.9
Filtering Out or Identifying Missing Data
6.9.1
Working with Missing data
6.9.2
Your Turn - learnr exercises
6.10
Filtering Out Duplicate observations
6.11
Slicing Data by Row
6.12
Randomly Sampling Your Rows
6.12.1
Your Turn - learnr exercises
6.13
Further Challenges
6.14
Explore More about Filtering
7
Wrangling Columns in R with Select, Rename, and Relocate
7.1
Goals for this Chapter
7.2
Packages needed for this Chapter
7.3
Pathway for this Chapter
7.4
Tidyselect Helpers in R
7.5
Selecting a Column Variables
7.5.1
Try this out
7.6
Selecting Columns that are Not Contiguous
7.7
Selecting Columns With Logical Operators
7.8
Further Challenges
7.9
Explore More about Filtering
8
Building Your Table One with the {gtsummary} Package
8.1
Using tbl_summary() from the gtsummary package
8.2
Making a Basic table
8.2.1
Challenges:
8.3
Multiple Dimensions
8.4
New Challenges
8.5
Even More Help
8.6
Figuring out Column names
8.7
New Challenges
8.8
Adding Some Formatting
8.8.1
Formatting with {gt}
8.9
A Fancier Version for gt
8.9.1
The {flextable} package
8.10
A Fancier Version for Flextable
9
Using Mutate to Make New Variables (Columns)
9.1
Calculating BMI
9.2
Recoding categorical or ordinal data
9.3
Calculating Glomerular Filtration Rate
10
Mutating Joins to Combine Data Sources
10.1
What are Joins?
10.2
What are Mutating Joins?
10.3
Let’s Start with Left Joins
10.4
Left Join in Action
10.5
Left Join in Practice
10.6
Quick Quiz
10.7
Problem variable names
10.8
Right Join in Action
10.9
Right Join in Practice
10.10
Inner Joins
10.11
Quick Quiz
10.12
Now Let’s take a Look at the result
10.13
Full Joins
10.14
Quick Quiz
10.15
Now Let’s take a Look at the result
11
Interpreting Error Messages
11.1
The Common Errors Table
11.2
Examples of Common Errors and How to fix them
11.2.1
Missing Parenthesis
11.2.2
An Extra Parenthesis
11.2.3
Missing pipe
%>%
in a data wrangling pipeline
11.2.4
Missing + in a ggplot pipeline
11.2.5
Pipe
%>%
in Place of a
+
11.2.6
Missing Comma Within a Function()
11.2.7
A Missing Object
11.2.8
One Equals Sign When you Need Two
11.2.9
Non-numeric argument to a binary operator
11.3
Errors Beyond This List
11.4
When Things Get Weird
11.4.1
Restart your R Session (Shift-Cmd-F10)
11.5
References:
12
Tips for Hashtag Debugging your Pipes and GGPlots
12.1
Debugging
12.2
The Quick Screen
12.3
Systematic Hunting For Bugs in Pipes
12.4
Systematic Hunting For Bugs in Plots
12.5
Hashtag Debugging
12.6
Pipe 2
12.7
Plot 2
12.8
Plot3
12.9
Pipe 3
13
Finding Help in R
13.1
Programming in R
13.2
Starting with Help!
13.3
The Magic of Vignettes
13.4
Googling the Error Message
13.5
You Know What You Want to Do, but Don’t Know What Package or Function to Use
13.5.1
CRAN Task Views
13.5.2
Google is Your Friend
13.6
Seeking Advanced Help with a Minimal REPREX
14
The Basics of Base R
14.1
Dimensions of Data Rectangles
14.2
Naming columns
14.3
Concatenation
14.4
Sequences
14.5
Constants
14.6
Fancier Sequences
14.7
Mathematical functions
14.8
Handling missing data (NAs)
14.9
Cutting Continuous data into Levels
15
Updating R, RStudio, and Your Packages
15.1
Installing Packages
15.1.1
Installing Packages from Github
15.1.2
Problems with Installing Packages
15.2
Loading Packages with Library
15.3
Updating R
15.4
Updating RStudio
15.5
Updating Your Packages
16
Major R Updates (Where Are My Packages?)
16.1
Preparing for a Minor or Major R Upgrade
16.1.1
Installing New Packages from Scratch
16.2
Rebuilding All Packages in One (Automated) Step
16.3
Checking the new library path
16.4
Now Check your list of Packages
16.5
Updating Packages
17
Intermediate Steps Toward Reproducibility
17.1
Level 3 Reproducibility
17.1.1
Creating a New Project in RStudio
17.1.2
File paths and the {here} package
17.2
Code Review with a Coding Partner
17.2.1
Checklist for Code Review
17.3
Sharing code on GitHub
18
Building Table One for a Clinical Study
18.1
Packages Needed for this Chapter:
18.2
Pathway for this Chapter
18.3
Baseline Characteristics
18.4
Building Your Table 1
18.4.1
Updating Variable Labels
18.4.2
Updating Variable Values
18.4.3
Table 1 separated by Treatment Arm
18.4.4
Styling our Table 1
18.4.5
Adding A Column Spanner
18.4.6
Further Styling our Table 1
18.4.7
Your Turn
18.5
Try this with a new dataset
18.6
Making Modifications to the trial table
18.7
More Modifications to the trial table
18.8
Taking Control of the Stats
18.8.1
Your Turn
19
Comparing Two Measures of Centrality
19.0.1
Applying the t test
20
Simple example of a t-test
20.1
Common Problem
20.1.1
How Skewed is Too Skewed?
20.1.2
Visualize the Distribution of data variables in ggplot
20.1.3
Visualize the Distribution of data$len in ggplot
20.1.4
Results of Shapiro-Wilk
20.1.5
Try it yourself
20.1.6
Mammal sleep hours
20.2
One Sample T test
20.2.1
How to do One Sample T test
20.2.2
Interpreting the One Sample T test
20.2.3
What are the arguments of the t.test function?
20.3
Insert flipbook for ttest here
20.3.1
Flipbook Time!
20.4
Fine, but what about 2 groups?
20.4.1
Setting up 2 group t test
20.4.2
Results of the 2 group t test
20.4.3
Interpreting the 2 group t test
20.4.4
2 group t test with wide data
20.4.5
Results of 2 group t test with wide data
20.5
3 Assumptions of Student’s t test
20.5.1
Testing Assumptions of Student’s t test
20.6
Getting results out of t.test
20.6.1
Getting results out of t.test
20.7
Reporting the results from t.test using inline code
20.7.1
For Next Time
21
Sample Size Calculations with
{pwr}
21.1
Sample Size for a Continuous Endpoint (t-test)
21.2
One Sample t-test for Lowering Creatinine
21.3
Paired t-tests (before vs after, or truly paired)
21.4
2 Sample t tests with Unequal Study Arm Sizes
21.5
Testing Multiple Options and Plotting Results
21.6
Your Turn
21.6.1
Scenario 1: FEV1 in COPD
21.6.2
Scenario 2: BNP in CHF
21.6.3
Scenario 3: Barthel Index in Stroke
21.7
Sample Sizes for Proportions
21.8
Sample size for two proportions, equal n
21.9
Sample size for two proportions, unequal arms
21.10
Your Turn
21.10.1
Scenario 1: Mortality on Renal Dialysis
21.10.2
Scenario 2: Intestinal anastomosis in Crohn’s disease
21.10.3
Scenario 3: Metformin in Donuts
21.11
add chi square
21.12
add correlation test
21.13
add anova
21.14
add linear model
21.15
add note on guessing effect sizes - cohen small, medium, large
21.16
Explore More
22
Randomization for Clinical Trials with R
22.1
Printing these on Cards
22.2
Now, try this yourself
22.3
Now Freestyle
23
Univariate ggplots to Visualize Distributions
23.1
Histograms
23.1.1
Comparisons of Distributions with Histograms
23.1.2
Histograms and Categories
23.2
Density Plots
23.2.1
Comparisons with Density plots
23.3
Comparing Distributions Across Categories
23.4
Boxplots
23.5
Violin Plots
23.6
Ridgeline Plots
23.6.1
Including Plots
23.6.2
Including Points
23.6.3
Including Points
23.6.4
Including Points
23.6.5
Including Points
24
Bivariate ggplot2 Scatterplots to Visualize Relationships Between Variables
24.1
Packages used in this Chapter
24.2
Data Exploration and Validation (DEV)
24.3
Scatterplots
24.3.1
Micro-quiz!
24.4
Mapping More Variables
24.5
Inheritance and Layering in ggplot2
24.6
Aesthetic mapping Micro-Quiz!
24.7
Controlling Point Shape, Size, and Color Manually
24.7.1
Manual Shapes
24.7.2
Manual Sizes
24.7.3
Manual Color
25
Extensions to ggplot
25.1
Goals for this Chapter
25.2
Packages Needed for this chapter
25.3
A Flipbook of Where We Are Going With ggplot Extensions
25.3.1
MAKE FLIPBOOK
25.4
A Waffle Plot
25.5
An Alluvial Plot
25.6
Lollipop Plots
25.7
Dumbbell Plots
25.8
Spaghetti Plots with Summary Smoothed Lines for Change Over Time
25.9
Swimmer Plots
25.10
Adding Significance Comparisons with {ggsignif}
26
Customizing Plot Scales
26.1
Goals for this Chapter
26.2
Packages Needed for this chapter
26.3
A Flipbook of Where We Are Going With Scales
26.4
A Basic Scatterplot
26.5
But what if you want the scale for risk to start at 0?
26.6
But this axis does not really start at Exactly 0
26.7
Control the Limits and the Breaks
26.8
Test what you have learned
26.9
Continuous vs. Discrete Plots and Scales
26.10
Using Scales to Customize a Legend
26.11
Test what you have learned
26.11.1
More Examples with Flipbooks
27
Helping out with ggplot
27.1
ggx::gghelp()
27.2
Getting more help with theming with ggThemeAssist
27.3
Website helpers for ggplot
27.4
Getting Even more help with esquisse
28
Functions
28.1
Don’t repeat yourself
28.2
Your Turn
28.3
Freestyle
28.3.1
Acknowledgement
28.4
Read More
29
Using Found (Web) Data
29.1
Found Poetry
29.2
Found Data
29.3
Download Example
29.4
Datapasta (small table) Example
29.5
Your Turn
29.6
{rvest} Example
29.7
Your Turn
29.8
API example with {tidycensus}
29.9
Challenges
29.10
Advanced Challenge - Dynamic Websites
30
Linear Regression and Broom for Tidying Models
30.1
Packages needed
30.2
Building a simple base model with {lm}
30.2.1
Producing manuscript-quality tables with {gtsummary}
30.3
Is Your Model Valid?
30.4
Making Predictions with Your Model
30.4.1
Predictions from new data
30.5
Choosing predictors for multivariate modeling – testing, dealing with collinearity
30.5.1
Challenges
30.6
presenting model results with RMarkdown
30.6.1
Challenges
30.7
presenting model results with a Shiny App
30.7.1
Challenges
31
Logistic Regression and Broom for Tidying Models
31.1
The Model Summary
31.2
Evaluating your Model Assumptions
31.3
Converting between logit, odds ratios, and probability
32
Fast and Frugal Trees with the {FFTrees} Package
32.1
Setup
32.2
The Breast Cancer Dataset
32.2.1
Data Inspection
32.2.2
Check Your Progress
32.3
Building a FFTrees Model for Breast Cancer
32.4
Your Turn with Heart Disease Data
32.4.1
Test what you have learned
32.5
Your Turn to Build and Interpret a Model
32.6
Now build your FFTrees model to predict improved status (vs. death)
33
A Gentle Introduction to Shiny
33.1
What is Shiny?
33.2
The Basic Structure of a Shiny App
33.2.1
The weirdness of a Shiny app
33.3
The User Interface Section Structure
33.4
The Server Section Structure
33.5
How to Run an App
33.5.1
How to Stop an App
33.6
Building a Very Simple App (Version 1)
33.6.1
The ui section
33.6.2
The server section
33.7
Edit this App (Version 2)
33.8
Building a User Interface for Inputs and Outputs
33.8.1
Inputs
33.8.2
Outputs
33.9
Building a Functioning Server Section
33.9.1
Using the input values & Data
33.9.2
Wrangling and Calculating
33.9.3
Rendering to HTML Outputs
33.10
Building a Simple Shiny App (Version 3)
33.11
Publishing Your Shiny App on the Web
33.12
More to Explore
34
Sharing Models with Shiny
34.0.1
Packages Needed for this Chapter
34.1
Setting up and Saving Models
34.1.1
Linear Model
34.1.2
Logistic Model
34.1.3
Random Forest Model
34.2
Building a Shiny App for the Linear Model
34.2.1
The Default Shiny App
34.2.2
Editing the
ui
sidebarPanel
for the Input Predictor Variables
34.2.3
Editing the
server
section to make Predictions
34.2.4
Editing the mainPanel in the ui section to display your Prediction
34.3
Building a Shiny App for the Logistic Model
34.3.1
The Default Shiny App
34.3.2
Editing the
ui
sidebarPanel
for the Input Predictor Variables
34.3.3
Editing the
server
section to make Predictions
34.3.4
Editing the mainPanel in the ui section to display your Prediction
34.4
Building a Shiny App for the Random Forest Model
34.5
Challenge Yourself
35
Introduction to R Markdown
35.1
What Makes an Rmarkdown document?
35.2
Trying out RMarkdown with a Mock Manuscript
35.3
Inserting Code Chunks
35.3.1
Code Chunk Icons
35.4
Including Plots
35.5
Including Tables
35.6
Including Links and Images
35.6.1
Links
35.6.2
Images
35.7
Other languages in code chunks
35.8
Code Chunk Options
35.9
How It All (Rmarkdown + {knitr} + Pandoc) Works
35.10
Knitting and Editing (and re-Knitting() Your Rmd document
35.11
Try Out Other Chunk Options
35.12
The
setup
chunk
35.13
Markdown syntax
35.14
2nd Header
35.14.1
3rd Header
35.15
Line Breaks and Page Breaks
35.16
Making Lists
35.16.1
Ordered Lists
35.16.2
Un-ordered lists
35.16.3
Nested Lists
35.17
The Easy Button - Visual Markdown Editing
35.17.1
Try inserting a list, a table and a block-quote
35.18
Inline Code
35.18.1
Try inserting some in-line R code
35.19
A Quick Quiz
36
Rmarkdown Output Options
36.1
Microsoft Word Output from Rmarkdown
36.1.1
Making a Styles Reference File for Microsoft Word
36.1.2
Let’s Practice This.
36.1.3
Re-formatting Your Template
36.1.4
Using Your New Styles Template
36.1.5
Now you are ready!
36.2
PDF Output from RMarkdown
36.2.1
LaTeX and tinytex
36.2.2
Knitting to PDF
36.3
Microsoft Powerpoint Output from Rmarkdown
36.3.1
Tables in Powerpoint
36.3.2
Images in Powerpoint
36.3.3
Plots in Powerpoint
37
Adding Citations to your RMarkdown
38
Quarto is a Next-Generation RMarkdown
38.1
Goals for this Chapter
38.2
Packages Needed for this chapter
38.3
Introducing Quarto
38.4
A Tour of Quarto
38.5
Opening a New Quarto Document
38.6
Annotating code in Quarto
38.7
The Visual Editor vs. Source Editor in Quarto
38.8
Adding Code Chunks
38.9
Organized Options in Code Chunks with the Hash-Pipe #|
38.10
Stating Global Options in Your YAML Header
38.10.1
Code Options and Code Folding
38.10.2
Parameters
38.11
Figures
38.12
Tables
38.13
Inline Code and Caching
38.14
Quarto at the Command Line
38.15
Citations in Quarto
38.16
Challenge Yourself
38.17
Exploring further
39
Running R from the UNIX Command Line
39.1
What is the UNIX Command line?
39.2
Why run R from the command line?
39.3
How do you get started?
39.3.1
On a Mac
39.3.2
On a Windows PC
39.4
The Yawning Blackness of the Terminal Window
39.5
Where Are We?
39.6
Cleaning Up
39.7
Other helpful file commands
39.8
What about R?
39.9
What about just a few lines of R?
39.10
Running an R Script from the Terminal
39.11
Rendering an Rmarkdown file from the Terminal
40
Secure Passwords in R
40.1
Setting New Keys
41
Dates and Times in R
41.1
Data Types for Dates and Times
41.2
Using POSIXlt
41.3
Formatting dates
41.3.1
Code Chunk Icons
41.4
Including Plots
41.5
Including Tables
41.6
Other languages in code chunks
41.7
Code Chunk Options
41.8
Try Out Other Chunk Options
41.9
The
setup
chunk
41.10
The Easy Button - Visual Markdown Editing
41.11
A Quick Quiz
42
Protecting PHI (Protected Health Information)
42.1
Protecting (Not Inadvertently Sharing) PHI
42.2
Identifying PHI
42.3
Selectively Deleting PHI
42.4
Problems with PHI-free data
42.5
Encrypting PHI
42.5.1
Generating Public and Private keys
42.6
Sharing synthetic data with {synthpop}
43
Building Data Pipelines with {targets}
43.1
What Does {targets} Do?
43.2
Air Quality Analysis
43.2.1
Prepping The Functions.R file
43.2.2
Checking Your Functions
43.2.3
Set Up the Pipeline
43.2.4
Pre-Build Checks
43.2.5
Changing the Pipeline
43.3
Your Turn - A Tuberculosis Analysis Pipeline
43.3.1
Making new Functions
43.3.2
Testing functions
43.4
Resetting functions before the Pipeline is Built
43.4.1
Setting Up {targets}
43.5
Editing the
_targets.R
File
43.5.1
Running the Pipeline
43.5.2
Modificatons to the Pipeline
43.5.3
Modify the Plot
43.6
Next Steps
44
Colors and Scales in {ggplot2}
44.1
Goals for this Chapter
44.2
Colors in R and {ggplot2}
44.2.1
Using pre-defined color names
44.2.2
Using color hex codes
44.2.3
Screen vs. Print Colors
44.2.4
Transparency and hex colors
44.2.5
More obscure ways to select colors
44.2.6
Using color palettes
44.2.7
Color-blind friendly palettes
44.3
Sequential, Diverging, and Qualitative Palettes
44.4
Choosing Colors with Meaning
45
Creating Risk Pictograms in {ggplot2}
45.1
Why Risk Pictograms?
45.2
Loading Libraries
45.3
Risk of Lymphoma
45.4
Risk of Surgery in CD - Your Turn
45.5
Risk of Hepatocellular Carcinoma with Hepatitis C Virus and Alcoholism - Your Turn
46
Using the {flowchart} package for CONSORT diagrams in R
46.1
Why Flowcharts?
46.2
Loading Libraries
46.3
A CONSORT Flowchart for Statins to Prevent HCC
46.4
Data
46.5
Branching
46.6
Splitting into Groups (Treatment Arms)
46.6.1
A Short Tangent on {ifelse}
46.7
Filtering for Completers
46.8
Modify exclusions for more detailed labels
46.8.1
Time for a Quiz
46.9
A More Complicated Study
46.10
Example
46.11
Branching
46.12
Splitting into Groups (Treatment Arms)
46.13
Filtering for Completers
46.14
Modify exclusions for more detailed labels
46.15
Next step
46.16
Now Add These Withdrawal Reasons
46.17
Your Turn
46.17.1
Fatigue Starting Box
46.17.2
Fatigue Branching fr=or Exclusions with fc_filter
46.17.3
Branching Fatigue
46.17.4
Fatigue Completers
46.17.5
Fatigue Text for Exclusions
46.17.6
Fatigue Add text for Exclusions
46.17.7
Withdrawal Reasons
46.17.8
Now add these to Your Flowchart with fc_modify
46.17.9
Final Fatigue Flowchart Drawing fatigue_fc6
46.17.10
Explore More Features
Title holder
References
Published with bookdown
Facebook
Twitter
LinkedIn
Weibo
Instapaper
A
A
Serif
Sans
White
Sepia
Night
PDF
EPUB
Reproducible Medical Research with R
Title holder