References

Abeysooriya, Megan AND Kasu, Mandhri AND Soria. 2021. “Gene Name Errors: Lessons Not Learned.” PLOS Computational Biology 17 (7): 1–13. https://doi.org/10.1371/journal.pcbi.1008984.
Abeysundera, Melanie. 2015. “Using Total Survey Error to Study Mode Effect and Other Applications.” In Proceedings of the Statistical Society of Canada Annual Meeting, Survey Methods Section.
Anscombe, F. J. 1973. “Graphs in Statistical Analysis.” The American Statistician 27 (1): 17–21. https://doi.org/10.1080/00031305.1973.10478966.
Ashton, Doug. 2018. “Where’s My T-Shirt? Supply Chain Forecasting in Fashion.” Mango Solutions. https://www.slideshare.net/DouglasAshton1/wheres-my-tshirt-supply-chain-forecasting-in-fashion.
Au, Randy. 2020a. “Data Cleaning IS Analysis, Not Grunt Work.” https://counting.substack.com/p/data-cleaning-is-analysis-not-grunt.
———. 2020b. “Let’s Get Intentional about Documentation.” https://counting.substack.com/p/lets-get-intentional-about-documentation.
Australian Bureau of Statistics. 2006. Australian and New Zealand Standard Industrial Classification (ANZSIC).” 2006, Revision 2.0. Australian Bureau of Statistics. https://www.abs.gov.au/statistics/classifications/australian-and-new-zealand-standard-industrial-classification-anzsic/2006-revision-2-0.
Boysel, Sam, and Davis Vaughan. 2023. fredr: An R Client for the ’FREDAPI. https://github.com/sboysel/fredr.
British Columbia Ferry Services Inc. 2021. “Annual Report to the British Columbia Ferries Commissioner.” British Columbia Ferry Services Inc. https://www.bcferries.com/web_image/h29/h7a/8854124527646.pdf.
Broman, Karl, and Kara Woo. 2017. “Data Organization in Spreadsheets.” The American Statistician 72 (1): 2–10. https://doi.org/10.1080/00031305.2017.1375989.
Bryan, Jenny. 2016a. “Sanesheets.” https://github.com/jennybc/sanesheets.
———. 2016b. “Spreadsheets.” https://github.com/jennybc/2016-06_spreadsheets.
———. 2017. gapminder: Data from Gapminder. https://CRAN.R-project.org/package=gapminder.
Bryan, Jenny, and The STAT 545 TAs. 2019. “STAT 545.” https://stat545.com/.
Colson, Eric, Brian Coffey, Tarek Rached, and Liz Cruz. 2017. “Algorithms Tour: How Data Science Is Woven into the Fabric of Stitch Fix.” Stitch Fix. https://algorithms-tour.stitchfix.com/.
Conway, Drew. 2010. “The Data Science Venn Diagram.” drewconway.com. http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram.
Cowgill, Matt, Zoe Meers, Jaron Lee, and David Diviny. 2023. readabs: Download and Tidy Time Series Data from the Australian Bureau of Statistics. https://github.com/mattcowgill/readabs.
Cunningham, Scott. 2021. Causal Inference: The Mixtape. Yale University Press. https://mixtape.scunning.com/.
Davenport, Thomas H., and Jeanne G. Harris. 2007. Competing on Analytics: The New Science of Winning. Harvard Business School Press.
Dowle, Matt, Arun Srinivasan, Jan Gorecki, Michael Chirico, Pasha Stetsenko, Tom Short, Steve Lianoglou, et al. 2023. data.table: Extension of ’Data.frame’. https://CRAN.R-project.org/package=data.table.
Elders, Benjamin, and Damiano Oldoni. 2020. tidylog: Logging for ’dplyr’ and ’tidyr’ Functions. https://CRAN.R-project.org/package=tidylog.
Elections BC. 2018. “Annual Report 2017/18 and Service Plan 2018/19 - 2020/21.” Elections BC. https://elections.bc.ca/docs/rpt/AR1718SP1821.pdf.
Elff, Martin, Christopher N. Lawrence, Dave Atkins, Jason W. Morgan, Kirill Müller, Pieter Schoonees, and Achim Zeileis. 2021. memisc: Management of Survey Data and Presentation of Analysis Results. https://CRAN.R-project.org/package=memisc.
Ellis, Sharon E., and Jeffrey T. Leek. 2017. “How to Share Data for Collaboration.” The American Statistician 72 (1): 53–57. https://doi.org/10.1080/00031305.2017.1375987.
EUROSTAT PRODCOM team. 2022. “European Business Statistics User’s Manual for PRODCOM.” 2023rd ed. Eurostat. https://doi.org/doi:10.2785/39767.
Executive Office of the President, Office of Management and Budget. 2022. North American industry classification system : United States, 2022.” US Census Bureau. https://www.census.gov/naics/reference_files_tools/2022_NAICS_Manual.pdf.
Firke, Sam. 2021. janitor: Simple Tools for Examining and Cleaning Dirty Data. https://github.com/sfirke/janitor.
Friendly, Michael, Chris Dalzell, Martin Monkman, Dennis Murphy, Vanessa Foot, and Justeena Zaki-Azat. 2020. Lahman: Sean ’Lahman’ Baseball Database. https://CRAN.R-project.org/package=Lahman.
Garmonsway, Duncan. 2023. unpivotr: Unpivot Complex and Irregular Data Layouts. https://CRAN.R-project.org/package=unpivotr.
Garmonsway, Duncan, Hadley Wickham, Jenny Bryan, RStudio, and Marcin Kalicinski. 2022. tidyxl: ERead Untidy Excel Files. https://CRAN.R-project.org/package=tidyxl.
Gelfand, Sharla, and City of Toronto. 2022. opendatatoronto: Access the City of Toronto Open Data Portal. https://CRAN.R-project.org/package=opendatatoronto.
Gelman, Andrew, John B. Carlin, Hal S. Stern, David B. Dunson, Aki Vehtari, and Donald B. Rubin. 2014. Bayesian Data Analysis. 3rd ed. CRC Press.
Gelman, Andrew, and Jennifer Hill. 2007. Data Analysis Using Regression and Multilevel/Hierarchical Models. CRC Press.
Hester, Jim, Hadley Wickham, libuv project contributors, Inc. Joyent, and other Node contributors. 2020. fs: Cross-Platform File System Operations Based on ’libuv. https://CRAN.R-project.org/package=fs.
Holbrook, Allyson L., Melanie C. Green, and Jon A. Krosnick. 2003. “Telephone Versus Face-to-Face Interviewing of National Probability Samples with Long Questionnaires: Comparisons of Respondent Satisficing and Social Desirability Response Bias.” Public Opinion Quarterly 67 (1): 79–125. https://doi.org/10.1086/346010.
Horst, Allison. 2020. palmerpenguins: Palmer Archipelago (Antarctica) Penguin Data. https://allisonhorst.github.io/palmerpenguins/.
Horst, Allison M., Alison Presmanes Hill, and Kristen B. Gorman. 2022. Palmer Archipelago Penguins Data in the palmerpenguins R Package - An Alternative to Anderson’s Irises.” The R Journal 14 (1): 244–54. https://doi.org/10.32614/RJ-2022-020.
Hudon, Caitlin. 2018. “Field Notes: Building Data Dictionaries.” https://caitlinhudon.com/2018/10/30/data-dictionaries/.
Hyndman, Rob J, and George Athanasopoulos. 2021. Forecasting: Principles and Practice. 3rd ed. OTexts. https://otexts.com/fpp3/.
Hyndman, Rob J, and Yeasmin Khandakar. 2008. “Automatic Time Series Forecasting: The Forecast Package for R.” Journal of Statistical Software 26 (3): 1–22. https://doi.org/10.18637/jss.v027.i03.
Hyndman, Rob, George Athanasopoulos, Christoph Bergmeir, Gabriel Caceres, Leanne Chhay, Mitchell O’Hara-Wild, Fotios Petropoulos, Slava Razbash, Earo Wang, and Farah Yasmeen. 2023. forecast: Forecasting Functions for Time Series and Linear Models. https://pkg.robjhyndman.com/forecast/.
Iannone, Richard, Joe Cheng, Barret Schloerke, Ellis Hughes, Alexandra Lauer, and JooYoung Seo. 2023. gt: Easily Create Presentation-Ready Display Tables. https://CRAN.R-project.org/package=gt.
Kaplan, Jacob, and Benjamin Schlegel. 2020. fastDummies: Fast Creation of Dummy (Binary) Columns and Rows from Categorical Variables. https://CRAN.R-project.org/package=fastDummies.
Khreis, H, J Johnson, K Jack, B Dadashova, and ES Park. 2008. Evaluating the Performance of Low-Cost Air Quality Monitors in Dallas, Texas.” Int J Environ Res Public Health 19 (3): 1647. https://doi.org/10.3390/ijerph19031647.
Knuth, Donald Ervin. 1992. Literate Programming. CSLI Lecture Notes ; No. 27. Stanford, Calif: Center for the Study of Language; Information.
Kuhn, Max. 2020. modeldata: Data Sets Useful for Modeling Packages. https://CRAN.R-project.org/package=modeldata.
Kuhn, Max, and Julia Silge. 2022. Tidy Modeling with R. O’Reilly Media. https://https://www.tmwr.org/.
Kuhn, Max, and Hadley Wickham. 2022. recipes: Preprocessing and Feature Engineering Steps for Modeling. https://recipes.tidymodels.org/.
Larmarange, Joseph. 2022. labelled: Manipulating Labelled Data. http://larmarange.github.io/labelled/.
Lee, Benjamin D. 2018. “Ten Simple Rules for Documenting Scientific Software.” PLoS Computational Biology 14 (12). https://doi.org/10.1371/journal.pcbi.1006561.
Little, Roderick J. A. 2020. Statistical Analysis with Missing Data. 3rd ed. Wiley Series in Probability and Statistics. Hoboken, NJ: Wiley.
Loo, Mark PJ van der, and Edwin de Jonge. 2021. “Data Validation Infrastructure for R.” Journal of Statistical Software 97: 1–33. https://doi.org/10.18637/jss.v097.i10.
Loo, Mark van der, and Edwin de Jonge. 2018. Statistical Data Cleaning with Applications in R. Wiley. https://onlinelibrary.wiley.com/doi/book/10.1002/9781118897126.
Loo, Mark van der, Edwin de Jonge, and Paul Hsieh. 2022. validate: Data Validation Infrastructure. https://CRAN.R-project.org/package=validate.
Marwick, Ben, Carl Boettiger, and Lincoln Mullen. 2018. “Packaging Data Analytical Work Reproducibly Using R (and Friends).” The American Statistician 72 (1): 80–88. https://doi.org/10.1080/00031305.2017.1375986.
Monkman, Martin. 2019. “Same Name, Different Bird.” https://martinmonkman.com/post/2019-06-02_same-name/.
———. 2023. dpjr: Companion data for the book The Data Preparation Journey: Finding Your Way With R. https://monkmanmh.github.io/dpjr.
Moritz, Steffen, and Thomas Bartz-Beielstein. 2017. imputeTS: Time Series Missing Value Imputation in R.” The R Journal 9 (1): 207–18. https://doi.org/10.32614/RJ-2017-009.
Müller, Kirill. 2022. DBI: R Database Interface. https://CRAN.R-project.org/package=DBI.
Müller, Kirill, and Hadley Wickham. 2021. tibble: Simple Data Frames. https://CRAN.R-project.org/package=tibble.
Müller, Kirill, Hadley Wickham, David A. James, Seth Falcon, D. Richard Hipp, Dan Kennedy, Joe Mistachkin, et al. 2023. RSQLite: SQLite Interface for R. https://CRAN.R-project.org/package=RSQLite.
Murrell, Paul. 2013. “Data Intended for Human Consumption, Not Machine Consumption.” In Bad Data Handbook, edited by Q. Ethan McCallum, 31–51. O’Reilly.
Nield, Thomas. 2016. Getting Started with SQL: A Hands-on Approach for Beginners. O’Reilly Media.
Ooms, Jeroen. 2022. pdftools: Text Extraction, Rendering and Converting of PDF Documents. https://CRAN.R-project.org/package=pdftools.
Peng, Roger D. 2011. “Reproducible Research in Computational Science.” Science, New series, 334: 1226–27. https://doi.org/10.1126/science.1213847.
Perez, Caroline Criado. 2019. Invisible Women: Data Bias in a World Designed for Men. Abrams Press.
Polonsky, Jonathan A., Amrish Baidjoe, Zhian N. Kamvar, Anne Cori, Kara Durski, John W. Edmunds, Rosalind M. Eggo, et al. 2019. “Outbreak Analytics: A Developing Data Science for Informing the Response to Emerging Pathogens.” Philosophical Transactions of the Royal Society B 374: 20180276. https://doi.org/10.1098/rstb.2018.0276.
R Core Team. 2021. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Research Data Management Service Group. 2021. “Guide to Writing "Readme" Style Metadata.” Cornell University. https://data.research.cornell.edu/content/readme.
Smith, David. 2017. “Reproducible Data Science with R.” https://blog.revolutionanalytics.com/2017/04/reproducible-data-science-with-r.html.
Statistics Canada. 2007. “Age of Person.” Statistics Canada. https://www23.statcan.gc.ca/imdb/p3Var.pl?Function=DEC&Id=25363.
———. 2019. Statistics Canada Quality Guidelines.” 12-539-X. Sixth. Statistics Canada. https://www150.statcan.gc.ca/n1/pub/12-539-x/12-539-x2019001-eng.htm.
———. 2020. “Definitions, Data Sources and Methods.” Statistics Canada. https://www.statcan.gc.ca/eng/concepts/index.
———. 2021. “National Travel Survey Microdata File.” Statistics Canada. https://doi.org/10.25318/24250001.
———. 2022. North American industry classification system (NAICS) Canada 2022.” 2022, Version 1.0. Statistics Canada. https://www.statcan.gc.ca/en/subjects/standard/naics/2022/v1/index.
St-Pierre, M., and Y. Béland. 2004. Mode effects in the Canadian Community Health Survey: A comparison of CAPI and CATI.” In Proceedings of the American Statistical Association Meeting, Survey Research Methods.
Suits, Daniel B. 1957. “Use of Dummy Variables in Regression Equations.” Journal of the American Statistical Association 52 (280): 548–51. https://doi.org/10.1080/01621459.1957.10501412.
Teucher, Andy, Sam Albers, Stephanie Hazlitt, and Province of British Columbia. 2023. bcdata: Search and Retrieve Data from the BC Data Catalogue. https://CRAN.R-project.org/package=bcdata.
Tukey, John W. 1977. Exploratory Data Analysis. Addison-Wesley.
United Nations Statistics Division. 2020. “Statistical Classifications.” United Nations. https://unstats.un.org/unsd/classifications/.
Vasilopoulos, Kostas. 2022. osnr: Client for the ’ONSAPI. https://CRAN.R-project.org/package=osnr.
Venables, Bill. 2010. Introduction to Data Technologies. By Paul Murrell (review).” Australian and New Zealand Journal of Statistics 52: 469–70. https://doi.org/ 10.1111/j.1467-842X.2010.00592.x.
von Bergmann, Jens, and Dmitry Shkolnik. 2021. cansim: Accessing Statistics Canada Data Table and Vectors. https://CRAN.R-project.org/package=cansim.
Walker, Kyle. 2023. Analyzing US Census Data: Methods, Maps, and Models in r. CRC Press. https://walker-data.com/census-r/.
Walker, Kyle, Matt Herman, and Kris Eberwein. 2023. tidycensus: Load US Census Boundary and Attribute Data as ’tidyverse’ and ’sf’-Ready Data Frames. https://CRAN.R-project.org/package=tidycensus.
Wang, Richard Y., M. P. Reddy, and Henry B. Kon. 1995. “Toward Quality Data: An Attribute-Based Approach.” Decision Support Systems 13 (3): 349–72.
Watson, Jasper. 2020. RBNZ: Download Data from the Reserve Bank of New Zealand Website. https://CRAN.R-project.org/package=RBNZ.
Weinstein, Barry R Zeeberg AND Joseph Riss AND David W Kane AND Kimberly J Bussey AND Edward Uchio AND W Marston Linehan AND J Carl Barrett AND John N. 2004. “Mistaken Identifiers: Gene Name Errors Can Be Introduced Inadvertently When Using Excel in Bioinformatics.” BMC Bioinfomatics 5 (80). https://doi.org/doi.org/10.1186/1471-2105-5-80.
White, Ethan, Elita Baldridge, Zachary Brym, and Kenneth Locey. 2013. “Nine Simple Ways to Make It Easier to (Re)use Your Data.” Ideas in Ecology and Evolution 6 (2): 1–10. https://doi.org/10.4033/iee.2013.6b.6.f.
Wickham, Hadley. 2014. “Tidy Data.” Journal of Statistical Software 59 (1): 1–23. https://doi.org/10.18637/jss.v059.i10.
———. 2015. Advanced R. CRC Press. http://adv-r.had.co.nz/.
———. 2019a. Advanced R. 2nd ed. CRC Press. https://adv-r.hadley.nz/index.html.
———. 2019b. stringr: Simple, Consistent Wrappers for Common String Operations. https://CRAN.R-project.org/package=stringr.
———. 2021a. forcats: Tools for Working with Categorical Variables (Factors). https://CRAN.R-project.org/package=forcats.
———. 2021b. tidyr: Tidy Messy Data. https://CRAN.R-project.org/package=tidyr.
———. 2023. Httr2: Perform HTTP Requests and Process the Responses. https://CRAN.R-project.org/package=httr2.
Wickham, Hadley, and Jennifer Bryan. 2019. readxl: Read Excel Files. https://CRAN.R-project.org/package=readxl.
Wickham, Hadley, Mine Çetinkaya-Rundel, and Garrett Grolemund. 2023. R for Data Science. 2nd ed. O’Reilly Media. https://r4ds.hadley.nz/.
Wickham, Hadley, Maximilian Girlich, Edgar Ruiz, and RStudio. 2023. dbplyr: A ’dplyr’ Back End for Databases. https://CRAN.R-project.org/package=dbplyr.
Wickham, Hadley, and Garrett Grolemund. 2016. R for Data Science. O’Reilly Media. https://r4ds.had.co.nz/.
Wickham, Hadley, and Jim Hester. 2020. readr: Read Rectangular Text Data. https://CRAN.R-project.org/package=readr.
Wickham, Hadley, and Evan Miller. 2021. haven: Import and Export SPSS, Stata and SAS Files. https://CRAN.R-project.org/package=haven.
Wickham, Hadley, and RStudio. 2021. nycflights13: Flights That Departed NYC in 2013. https://CRAN.R-project.org/package=nycflights13.
Wikipedia contributors. 2021. “Prawo Jazdy (Alleged Criminal) — Wikipedia, the Free Encyclopedia.” https://en.wikipedia.org/wiki/Prawo_Jazdy_(alleged_criminal).
———. 2022a. “List of Unicode Characters — Wikipedia, the Free Encyclopedia.” https://en.wikipedia.org/wiki/List_of_Unicode_characters.
———. 2022b. “PDF — Wikipedia, the Free Encyclopedia.” https://en.wikipedia.org/wiki/PDF.
Wilson, Greg, D. A. Aruliah, C. Titus Brown, Neil P. Chue Hong, Matt Davis, Richard T. Guy, Steven H. D. Haddock, et al. 2014. “Best Practices for Scientific Computing.” PLoS Biology 12 (1). https://doi.org/10.1371/journal.pbio.1001745.
Wilson, Greg, Jennifer Bryan, Karen Cranston, Justin Kitzes, Lex Nederbragt, and Tracy K. Teal. 2017. “Good Enough Practices in Scientific Computing.” PLoS Computational Biology 13 (6). https://doi.org/10.1371/journal.pcbi.1005510.
Xie, Yihui. 2021. bookdown: Authoring Books and Technical Documents with R Markdown. https://CRAN.R-project.org/package=bookdown.
Xie, Yihui, J. J. Allaire, and Garrett Grolemund. 2019. R Markdown: The Definitive Guide. CRC Press. https://bookdown.org/yihui/rmarkdown/.
Zumel, Nina, and John Mount. 2019. Practical Data Science with R. 2nd ed. Manning. https://www.manning.com/books/practical-data-science-with-r-second-edition.