Chapter 2 Life Advice

For this generally, you’ll need to have a decent handle on the R programming language and know your way around R Studio… I will add more tutorials and pointers for you (maybe sometime I’ll write my own…) The first bit of EDA is a little intro and overview of key commands in dplyr and tidyverse, so this will ideally help newer programmers, but this is far from exhaustive but hopefully gives you enough to get started if needed.

One thing with this textbook, you often will see me calling certain commands with the package in front (e.g., dplyr::count), the reason for this is because other packages can sometimes mask certain commands. What do I mean by this? Well, the order in which you load libraries (using library()) is meaningful, so if I load dplyr and then another package after with another command also called count() it will overwrite this, so be aware and always look at any conflicts as you load packages. R is great and tells you this in the console. If you find code breaks that previously worked, try and call the specific package in front of the command not working and check…

2.1 Some Potentially Cool Readings…[more to be added in time…]

Introduction to Data Mining/Background

  1. Turing, A. M. (2009). Computing machinery and intelligence. In Parsing the turing test (pp. 23-65). Springer, Dordrecht.

  2. Big Data in Practice: Using Big Data to Improve Healthcare Services [watch] (https://www.youtube.com/watch?v=7t75CNC34vU&ab_channel=TEDxTalksTEDxTalksVerified)

  3. Bias [watch] (https://www.youtube.com/watch?v=oXYIKcoyRbw&ab_channel=SXSWSXSW)

Exploratory Data Analysis

  1. Flake, J. K., & Fried, E. I. (2020). Measurement schmeasurement: Questionable measurement practices and how to avoid them. Advances in Methods and Practices in Psychological Science, 3(4), 456-465.

  2. Data Mining in Practice: Customer Segmentation [watch] (https://www.youtube.com/watch?v=pCLQkgcjMjY&ab_channel=AlanisBusinessAcademyAlanisBusinessAcademy)

  3. TED Talk: The Beauty of Data Visualization [watch] (https://www.youtube.com/watch?v=5Zg-C8AAIGg&ab_channel=TED-EdTED-Ed)

  4. Friendly, M. (2008). A brief history of data visualization. In Handbook of data visualization (pp. 15-56). Springer, Berlin, Heidelberg.

  5. Myatt, G. J. (2007). Making sense of data: a practical guide to exploratory data analysis and data mining. John Wiley & Sons.

  6. Rohrer, J. M. (2018). Thinking clearly about correlations and causation: Graphical causal models for observational data. Advances in Methods and Practices in Psychological Science, 1(1), 27-42.

  7. Epskamp, S., Cramer, A. O., Waldorp, L. J., Schmittmann, V. D., & Borsboom, D. (2012). qgraph: Network visualizations of relationships in psychometric data. Journal of Statistical Software, 48(4), 1-18.

  8. Epskamp, S. (2015). semPlot: Unified visualizations of structural equation models. Structural Equation Modeling: a multidisciplinary journal, 22(3), 474-483.

  9. Golino, H. F., & Epskamp, S. (2017). Exploratory graph analysis: A new approach for estimating the number of dimensions in psychological research. PloS one, 12(6).

Unsupervised Machine Learning

  1. Han, Kamber, & Pei. 2012. Data Mining: Concepts & Techniques. Elsevier. Chapter 10.1 – 10.3 in the course textbook

  2. Davidson BI, Jones SL, Joinson AN, Hinds J (2019) The evolution of online ideological communities. PLoS ONE 14(5): e0216932. https://doi.org/10.1371/journal.pone.0216932

  3. Han, Kamber, & Pei. 2012. Data Mining: Concepts & Techniques. Elsevier. Chapter 10.4-10.6 in the course textbook

  4. Data Labelling [watch] (https://www.youtube.com/watch?v=zWu3M3JkB24&ab_channel=MITClubofNorthernCaliforniaMITClubofNorthernCalifornia)

Text Mining

  1. Dang, S., & Ahmad, P. H. (2014). Text mining: Techniques and its application. International Journal of Engineering & Technology Innovations, 1(4), 866-2348.

  2. Dang, S., & Ahmad, P. H. (2014). Text mining: Techniques and its application. International Journal of Engineering & Technology Innovations, 1(4), 866-2348.