15 Where to go from here: The next steps in your R journey
The first steps have been made, but we can certainly not claim we have reached our destination on this journey. If anything, this book helped us reach the first peak in a series of mountains, giving us an overview of what else there is to explore. To provide some guidance on where to go next, I collated some resources worth exploring as a natural continuation to this book.
15.1 GitHub: A Gateway to even more ingenious R packages
If you feel you want more R or you are curious to know what other R packages can do to complement your analysis, then GitHub brings an almost infinite amount of options for you. Besides the CRAN versions of packages, you find development versions of R packages on Github. These usually incorporate the latest features but are not yet available through CRAN. Having said that, if you write your next publication, it might be best to work with packages that are released on CRAN. These are the most stable versions. After all, you don’t want the software to fail on you and hinder you from making that world-changing discovery.
Still, more and more R packages are released every day that offer the latest advancements in statistical computations and beyond. Methods covered in Chapter 14 are a great example of what has become possible with advanced programming languages. So if you want to live at the bleeding edge of innovative research methods, then look no further.
However, GitHub also serves another purpose: To back up your research projects and work collaboratively with others. I strongly encourage you to create your own GitHub account, even just to try it. RStudio has built-in features for working with GitHub, making it easy to keep track of your analysis and ensure regular backups. Nobody wants to clean up their data over months and lose it because their cat just spilt the freshly brewed Taiwanese High Mountain Oolong Tea over one’s laptop. I use GitHub for many different things, hosting my blog (The Social Science Sofa), research projects, and this book.
There is much more to say about GitHub that cannot be covered in this book, but if you seek an introduction, you might find GitHub’s Youtube Channel of interest or their ‘Get started’ guide. Best of all, setting up and using your GitHub account is free.
15.2 Books to read and expand your knowledge
There are undoubtedly many more books to read, explore and use. However, my number one recommendation to follow up with this book is ‘R for Data Science’. It takes your R coding beyond the context of Social Sciences and introduces new aspects, such as ‘custom functions’ and ‘looping’. These are two essential techniques if you, for example, have to fit a regression model for +20 subsets of your data, create +100 plots to show results for different countries, or need to create +1000 of individualised reports for training participants. In short, these are very powerful (and efficient) techniques you should learn sooner than later but are less essential for a basic understanding of data analysis in R for the Social Sciences.
If you want to expand your repertoire regarding data visualisations, a fantastic starting point represents ‘ggplot2: Elegant Graphics for Data Analysis’ and ‘Fundamentals of Data Visualization’. However, these days I am looking mostly for inspiration and new packages that help me create unique, customised plots for my presentations and publications. Therefore, looking at the source code of plots you like (for example, on GitHub) is probably the best way to learn about
ggplot2 and some new techniques of how to achieve specific effects (see also Chapter 15.4).
If you are a qualitative researcher, you might be more interested in what else you can do with R to systematically analyse large amounts of textual data (as was shown in Chapter 14). I started with the excellent book ‘Text Mining with R: A Tidy Approach’, which introduces you in greater depth to sentiment analysis, correlations, n-grams, and topic modelling. The lines of qualitative and quantitative research become increasingly blurred. Thus, learning these techniques will be essential moving forward and pushing the boundaries of what is possible with textual data.
R can do much more than just statistical computing and creating pretty graphs. For example, you can write your papers with it, even if you do not write a single line of code. From Chapter 6.4, you might remember that I explained that R Markdown files are an alternative to writing R scripts. Suppose you want to deepen your knowledge in this area and finally let go of Microsoft Word, I encourage you to take a peek at the ‘R Markdown Cookbook’ for individual markdown files and ‘bookdown: Authoring Books and Technical Documents with R Markdown’ for entire manuscripts, e.g. journal paper submissions or books. I almost entirely abandoned Microsoft Word, even though it served me well for so many years - thanks.
Lastly, I want to make you aware of another open source book that covers What They Forgot to Teach you About R by Jennifer Bryan and Jim Hester. It is an excellent resource to get some additional insights into how one should go about working in R and RStudio.
15.3 Engage in regular online readings about R
A lot of helpful information for novice and expert R users is not captured in books but online blogs. There are several that I find inspiring and an excellent learning platform, each focusing on different aspects.
The most comprehensive hub for all things R is undoubtedly ‘R-bloggers’. It is a blog aggregator which focuses on collecting content related to R. I use it regularly to read more about new packages, new techniques, helpful tricks to become more ‘fluent’ in R or simply find inspiration for my own R packages. More than once, I found interesting blogs by just reading posts on ‘R-bloggers.’ For example, the day I wrote this chapter, I learned about
emayili, which allows you to write your emails from R using R markdown. So not even the sky is the limit, it seems.
Another blog worth considering is the one from ‘Tidyverse’. This blog is hosted and run by RStudio and covers all packages within the
tidyverse. Posts like ‘waldo 0.3.0’ made my everyday data wrangling tasks a lot easier because finding the differences between two datasets can be like searching a needle in a haystack. For example, it is not unusual to receive two datasets that contain the same measures, but some of their column names are slightly different, which does not allow us to merge them in the way we want quickly. I previously spent days comparing and arranging multiple datasets with over 100 columns. Let me assure you, it is not really fun to do.
The blog ‘Data Imaginist’ by Thomas Lin Pedersen covers various topics around data visualisations. He is well known for his Generative Art, i.e. art that is computationally generated. But one of my favourite packages,
patchwork, was written by him and gained immense popularity. It has never been easier to arrange multiple plots with such little code.
Lastly, I want to share with you the blog of Cédric Scherer for those of you who want to learn more about high-quality data visualisations based on
ggplot2. His website hosts many visualisations with links to the source code on GitHub to recreate them yourself. It certainly helped me improve the visual storytelling of my research projects.
15.4 Join the Twitter community and hone your skills
Learning R means you also join a community of like-minded programmers, researchers, hobbyists and enthusiasts. Whether you have a Twitter account or not, I recommend looking at the #RStats community there. Plenty of questions about R programming are raised and answered on Twitter. In addition, people like to share insights from their projects, often with source code published on GitHub. Even if you are not active on social media, it might be worth having a Twitter account just to receive news about developments in R programming.
As with any foreign language, if we do not use it regularly, we easily forget it. Thus, I would like to encourage you to take part in Tidy Tuesday. It is a weekly community exercise around data visualisation and data wrangling. In short: You download a dataset provided by the community, and you are asked to create a visualisation as simple or complex as you wish. Even if you only manage to participate once a month, it will make you more ‘fluent’ in writing your code. Besides, there is a lot to learn from others because you are also asked to share the source code. This activity takes place on Twitter, and you can find contributions by using #TidyTuesday. Whether you want to share your plots is up to you, but engaging with this activity will already pay dividends. Besides, it is fun to work with datasets that are not necessarily typical for your field.