# Introduction to Data Exploration and Analysis with R

*Michael Mahoney*

*2019-01-24*

# Welcome to IDEAR

There are only two kinds of programming languages: those people always bitch about and those nobody uses.

— Bjarne Stroustrup

This book is designed as a crash course in coding with R and data analysis, built to be usable for a course spanning either a semester or a year. The book assumes no prior experience with R or data analysis, and aims to be equally applicable to industry and scientific purposes.

This reader is currently being continuously deployed to bookdown.org and GitHub, particularly as new sections are completed or old ones restructured. This is so that I can get feedback from the small group of people who are using this book to learn R themselves, so I can adjust and adapt the text as needed. If you’d like to help with this process, I’d love to hear from you, at mike.mahoney.218@gmail.com, or see a pull request made on GitHub. More information about me can be found at my website, which just so happens to have been built in R.

## 0.1 Course Outline

This course serves as an introduction to R for scientific and research applications, focusing specifically on exploratory data analysis, hypothesis generation and confirmation, data visualization, and communication of results. It is not an introduction to statistics course, though it will teach some statistics. It isn’t even an introduction to computer programming or computer science course, though it will teach some of each of those.

The front half of the course includes an introduction to coding with R, a review of statistics, and two walk-through tutorials of data analyses to give you templates to build your own analyses from. In order, the units are:

- Introduction to R and Data Visualization
- R Functions and Workflow
- Basic Statistics with R
- Introduction to Data Analysis
- Functions and Scripting
- More Complicated Analyses

The middle section of the course then focuses on communicating your results and methodologies, and introduces Git and Latex (as well as R Markdown) in order to improve your collaboration skills. The units comprising this section include:

- Achieving Graphical Excellence
- Playing Nicely with Others (Communicating With Code)

The end of the course then covers topics which I have found to be more specialized, and - while important - not as common in day-to-day data analyses. We also finish with a brief introduction to machine learning techniques, to give you the basic knowledge to be familiar with the future of statistical analyses. This section includes the units:

- Working with Text
- Dates and Times
- Specialized Applications
- Machine Learning

The backmatter of the reader then concerns how to get help outside of this book, containing both links to useful resources and some frequently asked questions. The two units in this section are:

- Useful Resources
- FAQ

## 0.2 Other Sources

If this introduction isn’t quite your style, I’d highly recommend Garrett Grolemund and Hadley Wickham’s R for Data Science, as well as Wickham’s Advanced R. Many other useful resources can be found in Chapter 13, at the end of this book.