Chapter 1 Introduction
R is an open source software that is widely used by many for different purposes. R, at its heart, is a programming language but is also useful for handling data and analysis. It is useful for visualization relatively more than some other statistical packages. Since it allows for programming as well, it makes it more powerful than some other statistical tools for analysis. It has other capabilities as well such as writing web-based books and manuals - for example, this book is written using an R package (more on packages later). Although it might seem challenging to a beginner, it is well worth the effort to learn it as it opens up immense opportunities for doing many things with it. There are many sources for learning R and every learner is at a different stage of learning.
This book is written for a beginner who is just starting out and would like to start learning how to use it for basic data analysis as a beginner. This book will help you understand how R works, how to use its functionality, and to understand what R is capable of doing at the most basic level of analysis. Before going into these steps, it is important to understand what open source means since that makes R more versatile while seemingly challenging to a beginner.
1.1 What is an open source platform?
An open source platform means that it is open for anyone to take and extend the code and add greater functionality to it. R is built by an international community of volunteers contributing and extending its functionality in innumerable ways. As you learn it, my hope is that you will come to appreciate its power and its versatility, due to contributions from many. Started initially as a free software environment for statistical computing and graphing by Robert Gentleman and Ross Ihaka (R and R) of the R Core Team (2020), it has grown to many contributors allowing it to continue to grow as a platform. R is considered a different implementation of S language initially developed by AT&T Bell Laboratories. More on its beginnings can be found at the main site for the software.
For a beginner, along with R, what is also required is an understanding of R-Studio which is an IDE (Integrated Development Environment) for R. The IDE makes it easy to develop and test software. The basic open source version of R-Studio software is free. In addition, it provides an easy to use interface to maximize use of R’s functionality through its extension of capability through what are known as packages. Although all of this can be accomplished in R, R-Studio makes it more intuitive once you get used to its features. As an example, Figure 1.1 shows an image of R and its user interface with its command line for inputs.
1.2 How R-Studio Differs from R
R-Studio has an interface that makes it easier to use a GUI (Graphical User Interface). Although it is an IDE for developing software, a beginner can make use of its other features that help in setting and changing directories, installing packages, and other features in a more intuitive way. Chapter 3 will go into more detail on the R-Studio interface and its various features. For now, see Figure 1.2 for a comparative image of R-Studio’s interface compared to R. Although, R forms the foundation for R-Studio, it cannot work without R and so needs an installation of R followed by an installation of R-Studio. If you choose to use a cloud version of R-Studio, it is still helpful to understand the concepts in this chapter.
Since R (and R -Studio) work through the use of many packages, it is helpful to understand what a package is.
1.3 What is a package?
Package may be thought of as a special software application designed to perform specific functions. These packages are all contributed by the community of volunteers and are available as add-ins to be used by R’s community of users. Hadley Wickham (2015) refers to it as shareable code that is combined with data and documentation with examples and description on what the package does. As you start using the software, you will get more comfortable with the concept of packages, what they can help you accomplish, and the many packages that are available for use with different functionality. As of writing of this book, MRAN(Microsoft R Application Network) showed 17,091 packages. This high number indicates how it is not possible to master or know every package. Since every package is meant for accomplishing different purposes, the user needs to familiarize himself or herself with the packages relevant to her or his task.
References
R Core Team. 2020. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Wickham, Hadley. 2015. R Packages: Organize, Test, Document, and Share Your Code. 1st ed. Sebastopol, California: O’ Reilly Media Inc. http://r-pkgs.had.co.nz/intro.html.