Introduction to Data Exploration and Analysis with R
Welcome to IDEAR
There are only two kinds of programming languages: those people always bitch about and those nobody uses.
— Bjarne Stroustrup
Welcome to Introduction to Data Exploration and Analysis in R (IDEAr)! This book is designed as a crash course in coding with R and data analysis, built for people trying to teach themselves the techniques needed for most analyst jobs today. The book assumes no prior experience with R or data analysis, and aims to be equally applicable to industry and scientific purposes.
This reader is currently being continuously deployed to bookdown.org and GitHub, particularly as new sections are completed or old ones restructured. This is so that I can get feedback from the small group of people who are using this book to learn R themselves, so I can adjust and adapt the text as needed. If you’d like to help with this process, I’d love to hear from you, at firstname.lastname@example.org, or see a pull request made on GitHub. More information about me can be found at my website, which just so happens to have been built in R.
0.1 The State of the Book
When I began writing this book, I was designing it as course notes - a reader, of sorts - for an undergraduate course, which would be supported with lectures, homeworks, and other supporting materials. I was distributing it mostly to friends, who would text me directly with questions, and so was willing to let things be a little confusing or a little out of order.
What I’ve found is that the book is useful mostly for people self-studying to learn data analytics and R, even though that isn’t what it was designed for. Meanwhile, it doesn’t seem like I’ll be teaching the theoretical course any time soon - so it would make sense that, if I’m going to leave this book up, I should redesign it to be a more useful document.
And so that’s what I’m doing now - redesigning this book in order to make it more useful to people self-studying, teaching themselves R via the examples and exercises contained in here. That means a lot of things are going to get shifted and edited - particularly towards the end of the book, where certain topics (machine learning, for instance) could be sacrificed, in order to give a more streamlined learning experience. That means that the book is about to shift a lot, which might be a pain for anyone currently using it - which I am sorry about! If you want, I can send anyone HTML or PDF files of old versions of the chapters, if that would be helpful. Just contact me via any method above.
0.2 Book Outline
This book serves as an introduction to R for scientific and business applications, focusing specifically on exploratory data analysis, hypothesis generation and confirmation, data visualization, and communication of results. It requires no prior knowledge of computer programming, computer science, or statistics, though a solid base in any of those fields will likely make your learning process faster.
The goal is to leave you with the basic essentials of working in R. You won’t be an R maestro, and you won’t have developed domain-specific knowledge - but you’ll have the foundational knowledge that will allow you to learn those from other resources. To that end, we’ll focus primarily on the basic language skills required to implement those more complicated methods, and will skip over things such as natural language processing and machine learning.
The front half of the course includes an introduction to coding with R, a review of statistics, and two walk-through tutorials of data analyses to give you templates to build your own analyses from. In order, the units are:
- Introduction to R and Data Visualization
- R Functions and Workflow
- Basic Statistics with R
- Introduction to Data Analysis
- Functions and Scripting
- More Complicated Analyses
The middle section of the course then focuses on communicating your results and methodologies, and introduces ways to communicate with code in order to improve your collaboration skills. The units comprising this section include:
- Achieving Graphical Excellence
- Playing Nicely with Others (Communicating With Code)
The end of the course then covers topics which I have found to be more specialized, and - while important - not as universally applicable to every project. This section includes the units:
- Working with Text
- Dates and Times
- Specialized Applications
The backmatter of the reader then concerns how to get help outside of this book, containing both links to useful resources and some frequently asked questions. The two units in this section are:
- Useful Resources