Chapter 2 Introduction to Tidyverse
Tidyverse is a collection of R packages that provides set of tools for data manipulation, exploration, visualization, and modeling.
2.1 Core Tidyverse
Some of the key packages in tidyverse, likely use in data analysis, include:
ggplot2
: A powerful and flexible package for creating graphics and data visualizations. It follows the grammar of graphics philosophy, allowing users to build complex plots layer by layer.dplyr
: A package for data manipulation tasks such as filtering, selecting, arranging, grouping, and summarizing data.tidyr
: A package for reshaping and tidying datareadr
: A package for reading and writing structured text files, including CSV, TSV, and fixed-width format files.purrr
: A package for functional programming in Rtibble
: A tidy alternative to traditional data frames, providing better printing, subsetting, and handling of missing values.forcats
: A package for working with categorical variables (factors) in R.lubridate
: A package for working with dates and times. It provides functions to parse, manipulate, and work with date-time objects efficiently.
2.2 Install and Load Package
Install all the package in the tidyverse by running :
Load the core tidyverse by running :
2.3 Pipe Operator
The pipe operator (%>%)
is a feature of the magrittr
package, which is commonly used alongside dplyr
and other tidyverse
packages.
The pipe operator (%>%)
simplifies code organization by allowing sequential execution of actions. It enhances code readability and efficiency by seamlessly transferring the output of one operation as input to the next, thereby reducing the need for intermediate variables and promoting a clear data manipulation workflow.
Example of utilization of pipe (%>%)
for data manipulation :
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
# Sample dataset
students <- data.frame(
name = c("Alice", "Bob", "Charlie"),
math_score = c(85, 90, 75),
science_score = c(88, 82, 79)
)
# Calculate the total score for each student and filter students with a total score above 160
students_filtered <- students %>%
mutate(total_score = math_score + science_score) %>% # Add a new column for total score
filter(total_score > 160) # Filter students with a total score above 160
print(students_filtered)
## name math_score science_score total_score
## 1 Alice 85 88 173
## 2 Bob 90 82 172