References
Anonymous. 2013. “What Are Outliers in the Data?” https://www.itl.nist.gov/div898/handbook/prc/section1/prc16.htm.
Berinato, Scott. 2016. “Visualizations That Really Work.” Harvard Business Review 94 (6): 93–100.
Blei, David M, and Padhraic Smyth. 2017. “Science and Data Science.” Proceedings of the National Academy of Sciences 114 (33): 8689–92.
Conlen, Matthew. 2020. “Kernel Density Estimation.” https://mathisonian.github.io/kde/.
Connolly, Thomas M, and Carolyn E Begg. 2015. Database Systems: A Practical Approach to Design, Implementation, and Management. 6th ed. Pearson Education.
CrowdFlower. 2016. “2016 Data Science Report.” https://visit.figure-eight.com/rs/416-ZBE-142/images/CrowdFlower_DataScienceReport_2016.pdf.
Cukier, Kenneth, and Viktor Mayer-Schoenberger. 2013. “The Rise of Big Data: How It’s Changing the Way We Think about the World.” Foreign Affairs 92: 28–40.
De Jonge, Edwin, and Mark Van Der Loo. 2013. An Introduction to Data Cleaning with r. https://cran.r-project.org/doc/contrib/de_Jonge+van_der_Loo-Introduction_to_data_cleaning_with_R.pdf; Statistics Netherlands Heerlen.
De Medeiros, Mauricius Munhoz, Norberto Hoppen, and Antonio Carlos Gastaud Maçada. 2020. “Data Science for Business: Benefits, Challenges and Opportunities.” The Bottom Line 33.
Dougherty, Jack, and Ilya Ilyankou. 2020. Hands-on Data Visualization: Interactive Storytelling from Spreadsheets to Code. https://handsondataviz.org/.
Few, Stephen. 2007. “Save the Pies for Dessert.” Visual Business Intelligence Newsletter, 1–14.
Figalist, Iris, Christoph Elsner, Jan Bosch, and Helena Holmström Olsson. 2021. “Breaking the Vicious Circle: A Case Study on Why AI for Software Analytics and Business Intelligence Does Not Take Off in Practice.” Journal of Systems and Software, 111135.
Gelman, Andrew. 2020. “Please Socially Distance Me from This Regression Model!” https://statmodeling.stat.columbia.edu/2020/07/18/please-socially-distance-me-from-this-regression-model/; Statistical Modeling, Causal Inference; Social Science Blog.
Gould, Alex. 2020. “Three Strategies for Working with Big Data in r.” https://rviews.rstudio.com/2019/07/17/3-big-data-strategies-for-r/.
Grolemund, Garrett, and Hadley Wickham. 2018. “R for Data Science.”
Hall, Tracy, Sarah Beecham, David Bowes, David Gray, and Steve Counsell. 2011. “A Systematic Literature Review on Fault Prediction Performance in Software Engineering.” IEEE Transactions on Software Engineering 38 (6): 1276–1304.
Hodge, Victoria, and Jim Austin. 2004. “A Survey of Outlier Detection Methodologies.” Artificial Intelligence Review 22 (2): 85–126.
Ihaka, Ross, and Robert Gentleman. 1996. “R: A Language for Data Analysis and Graphics.” Journal of Computational and Graphical Statistics 5 (3): 299–314.
Islam, Nazrul, Stephen J Sharp, Gerardo Chowell, Sharmin Shabnam, Ichiro Kawachi, Ben Lacey, Joseph M Massaro, Ralph B D’Agostino, and Martin White. 2020. “Physical Distancing Interventions and Incidence of Coronavirus Disease 2019: Natural Experiment in 149 Countries.” BMJ 370.
Ismay, Chester, and Albert Y Kim. 2020. Statistical Inference via Data Science: A Modern Dive into r and the Tidyverse. CRC Press.
Kabacoff, Robert. 2015. R in Action: Data Analysis and Graphics with r. 2nd ed. Manning.
———. 2019. Data Visualization with r. https://rkabacoff.github.io/datavis/.
Kaisler, Stephen, Frank Armour, J Alberto Espinosa, and William Money. 2013. “Big Data: Issues and Challenges Moving Forward.” In 2013 46th Hawaii International Conference on System Sciences, 995–1004. IEEE.
Knuth, Donald Ervin. 1984. “Literate Programming.” The Computer Journal 27 (2): 97–111.
Landerman, Lawrence R, Kenneth C Land, and Carl F Pieper. 1997. “An Empirical Evaluation of the Predictive Mean Matching Method for Imputing Missing Values.” Sociological Methods & Research 26 (1): 3–33.
Lee, Bongshin, Nathalie Henry Riche, Petra Isenberg, and Sheelagh Carpendale. 2015. “More Than Telling a Story: Transforming Data into Visually Shared Stories.” IEEE Computer Graphics and Applications 35 (5): 84–90.
Lee, I. 2017. “Big Data: Dimensions, Evolution, Impacts, and Challenges.” Business Horizons 60 (3): 293–303.
Little, R. 1988. “Missing-Data Adjustments in Large Surveys.” J. Of Business & Economic Statistics 6 (3): 287–96.
Little, R., and D. Rubin. 2002. Statistical Analysis with Missing Data. Book. 2nd ed. New York: John Wiley & Sons.
Marwick, Ben, Carl Boettiger, and Lincoln Mullen. 2018. “Packaging Data Analytical Work Reproducibly Using r (and Friends).” The American Statistician 72 (1): 80–88.
McAfee, Andrew, Erik Brynjolfsson, Thomas H Davenport, DJ Patil, and Dominic Barton. 2012. “Big Data: The Management Revolution.” Harvard Business Review 90 (10): 60–68.
McKinsey Global Institute. 2016. “The Age of Analytics: Competing in a Data-Driven World.” San Francisco: McKinsey & Company.
Mount, John. 2020. “Data Science Is a Science (Just Not the One You May Think).” https://win-vector.com/2020/09/10/data-science-is-a-science-just-not-the-one-you-may-think/.
Neitmann, Thomas. 2021. “How to Add a Regression Line to a Ggplot?” https://thomasadventure.blog/posts/ggplot-regression-line/.
Peng, Roger D, and Elizabeth Matsui. 2015. “The Art of Data Science.” A Guide for Anyone Who Works with Data. Skybrude Consulting, LLC.
Pigott, T. 2001. “A Review of Methods for Missing Data.” Educational Research and Evaluation 7 (4): 353–83.
Pimentel, João Felipe, Leonardo Murta, Vanessa Braganholo, and Juliana Freire. 2019. “A Large-Scale Study about Quality and Reproducibility of Jupyter Notebooks.” In 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR), 507–17.
Ravi, Kumar, and Vadlamani Ravi. 2015. “A Survey on Opinion Mining and Sentiment Analysis: Tasks, Approaches and Applications.” Knowledge-Based Systems 89: 14–46.
RStudio Team. 2020. “RStudio: Integrated Development Environment for r. Boston, MA: RStudio, Inc; 2020.” http://www.rstudio.com/.
Rubin, Donald B. 1976. “Inference and Missing Data.” Biometrika 63 (3): 581–92.
Sagar, Ram. 2021. “Big Data to Good Data: Andrew Ng Urges ML Community to Be More Data-Centric and Less Model-Centric.” Analytics India Magazine. https://analyticsindiamag.com/big-data-to-good-data-andrew-ng-urges-ml-community-to-be-more-data-centric-and-less-model-centric/.
Scheffer, Judi. 2002. “Dealing with Missing Data.” In Research Letters in the Information and Mathematical Sciences, 153–60.
Schucany, William R. 2004. “Kernel Smoothers: An Overview of Curve Estimators for the First Graduate Course in Nonparametric Statistics.” Statistical Science, 663–75.
Scott, Chacon, and Straub Ben. 2014. Pro Git. 2nd ed. https://git-scm.com/book/en/v2.
Sellers, Mark. 2019. “Field Guide to the r Ecosystem.” https://fg2re.sellorm.com/.
Shepperd, Martin, Yuchen Guo, Ning Li, Mahir Arzoky, Andrea Capiluppi, Steve Counsell, Giuseppe Destefanis, Stephen Swift, Allan Tucker, and Leila Yousefi. 2019. “The Prevalence of Errors in Machine Learning Experiments.” In International Conference on Intelligent Data Engineering and Automated Learning, 102–9. Springer.
Shepperd, Martin, Qinbao Song, Zhongbin Sun, and Carolyn Mair. 2013. “Data Quality: Some Comments on the NASA Software Defect Datasets.” IEEE Transactions on Software Engineering 39 (9): 1208–15.
Sievert, Carson. 2020. Interactive Web-Based Data Visualization with r, Plotly, and Shiny. CRC Press.
Song, Q., and M. Shepperd. 2007. “Missing Data Imputation Techniques.” International Journal of Business Intelligence & Data Mining 2 (3): 261–91.
Team, R Core. 2022. “The R Project for Statistical Computing.” https://www.R-project.org/; R Foundation for Statistical Computing, Vienna, Austria.
Wickham, Hadley. 2012. “Style Guide.” http://adv-r.had.co.nz/Style.html.
———. 2021. Mastering Shiny. https://mastering-shiny.org/index.html; "O’Reilly Media, Inc.".
Wilcox, Rand R. 2011. Introduction to Robust Estimation and Hypothesis Testing. 3rd ed. Academic press.
Wilkinson, Mark D, Michel Dumontier, Isbrand Jan Aalbersberg, Gabrielle Appleton, Myles Axton, Arie Baak, Niklas Blomberg, et al. 2016. “The FAIR Guiding Principles for Scientific Data Management and Stewardship.” Scientific Data 3 (1): 1–9.
Williamson, David F, Robert A Parker, and Juliette S Kendrick. 1989. “The Box Plot: A Simple Visual Method to Interpret Data.” Annals of Internal Medicine 110 (11): 916–21.
Xie, Yihui. 2015. Dynamic Documents with R and Knitr. 2nd ed. Boca Raton, Florida: Chapman; Hall/CRC. http://yihui.org/knitr/.
Zaidman, Marsha. 2004. “Teaching Defensive Programming in Java.” Journal of Computing Sciences in Colleges 19 (3): 33–43.
Zhang, Lei, Shuai Wang, and Bing Liu. 2018. “Deep Learning for Sentiment Analysis: A Survey.” Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 8 (4): e1253.