Why read this book?

This is a book written to support the MSc module “Modern Data” (CS5702) at Brunel University London. It contains extended versions of each lecture plus the associated seminar and lab exercises plus a lot of contextual material. It is organised as one chapter per week together with a Week 0 Getting Ready chapter. I strongly encourage students to read the relevant chapters in advance of the lectures. I also urge you to try out the examples, experiment and generally get their `hands dirty’. Data science should not be a stale or purely theoretical subject. Like many other skills, lots of practice really is essential.

As a starting point, please take a look at the unnumbered chapter Getting Ready in order to prepare for the module. You should try to do this either before arriving at Brunel or during the Induction Week (Week 0).

It is the nature of Data Science that the subject is inherently multi-disciplinary. So students will in all probability have varied backgrounds. This might not necessarily include either statistics or computer science. This course will make very few assumptions about your prior knowledge so please don’t worry if you’ve never done any programming before or you find maths a little off-putting. We will start from first principles for R. A grounding in statistics (including an overview of descriptive, inferential and modelling methods) will be provided by the ‘sibling’ module Quantitative Data Analysis (CS5701) that runs in parallel to Modern Data. There are many excellent books available which combine an introduction to statistics with using R. I have found Kabacoff (Kabacoff 2015) very readable.

An update for 2024/25 is the use of generative AI, in particular ChatGPT. The potential for support from such tools is enormous, and their capabilities are growing rapidly. However, the potential to use these tools to attempt to bypass learning is also growing. For this reason, we strongly urge students to learn the basics first and then we can explore how to effectively integrate generative AI into Data Science.

As a technical aside, this book has been written using bookdown which is based on markdown, knitr and Pandoc which enables it to be rendered into wide range of formats including html and pdf (using LaTeX) (Xie 2015). It also supports embedding code, in this case R. The R bookdown package can be installed from CRAN or GitHub.

References

Kabacoff, Robert. 2015. R in Action: Data Analysis and Graphics with r. 2nd ed. Manning.
Xie, Yihui. 2015. Dynamic Documents with R and Knitr. 2nd ed. Boca Raton, Florida: Chapman; Hall/CRC. http://yihui.org/knitr/.