1.4 Using software in research
Many people use spreadsheets (such as Microsoft Excel) for analysis of data in research.
Using spreadsheets requires extreme care; many extremely expensive and dangerous errors have been made due to using spreadsheets (AlTarawneh and Thorne 2017), including problems when reporting the 2020 COVID-19 pandemic.
Problems may emerge for many different reasons:
- Spreadsheets can automatically change the entered data (for example, reformatting entries as dates if the spreadsheet thinks the data should be a date), even when not appropriate. This has had dire consequences (Ziemann et al. 2016).
- Spreadsheets may include formulas with errors (Panko and Sprague Jr 1998), that are incredibly difficult to locate and hence fix (Galletta et al. 1996; Panko 2016; London and Slagter 2021).
- Spreadsheets do not leave a record of how the data have been analysed or prepared; for example, formulas can be very difficult to understand and parse. Keeping a record of the analysis, preparation of variables, and other operations with the data are part of what is called reproducible research (Simons and Holmes 2019). Reproducibility ensures, among other advantages, that the results can be checked by the researchers and by others.
- Excel has bugs (Keeling and Pavur 2004; Mélard 2014) even in very basic operations (Berger 2007; Hargreaves and McWilliams 2010). After trying to fix these bugs, sometimes they are made even worse (McCullough and Wilson 2002).
Spreadsheets can be used for research and analysis… but you must be very careful!
Many of the problems with using spreadsheets are due to human error, but spreadsheets make the errors hard to find. Some errors emerge because Excel is being used for purposes it is not really designed for (i.e., scientific analysis).
In this course,
we will sometimes show output from the statistical software packages
jamovi
(The jamovi Project)
and
SPSS
(IBM Corp 2016).
References
AlTarawneh G, Thorne S. A pilot study exploring spreadsheet risk in scientific research. arXiv preprint arXiv:170309785. 2017;
Berger RL. Nonstandard operator precedence in Excel. Computational Statistics & Data Analysis. Elsevier; 2007;51(6):2788–91.
Galletta DF, Hartzel KS, Johnson SE, Joseph JL, Rustagi S. Spreadsheet presentation and error detection: An experimental study. Journal of Management Information Systems. Taylor & Francis; 1996;13(3):45–63.
Hargreaves BR, McWilliams TP. Polynomial trendline function flaws in Microsoft Excel. Computational Statistics & Data Analysis. Elsevier; 2010;54(4):1190–6.
IBM Corp. IBM SPSS statistics for Windows, version 24.0. Armonk, NY: IBM Corp; 2016.
Keeling KB, Pavur RJ. Numerical accuracy issues in using Excel for simulation studies. Proceedings of the 2004 winter simulation conference, 2004. IEEE; 2004. p. 1513–8.
London RE, Slagter HA. Statement of retraction: Effects of transcranial direct current stimulation over left dorsolateral pFC on the attentional blink depend on individual baseline performance. Journal of Cognitive Neuroscience. 2021;1.
McCullough BD, Wilson B. On the accuracy of statistical procedures in Microsoft Excel 2000 and Excel XP. Computational Statistics & Data Analysis. Elsevier; 2002;40(4):713–21.
Mélard G. On the accuracy of statistical procedures in Microsoft Excel 2010. Computational Statistics. Springer; 2014;29(5):1095–128.
Panko R. What we don’t know about spreadsheet errors today: The facts, why we don’t believe them, and what we need to do. arXiv preprint arXiv:160202601. 2016;
Panko RR, Sprague Jr RH. Hitting the wall: Errors in developing and code inspecting a ‘simple’ spreadsheet model. Decision Support Systems. Elsevier; 1998;22(4):337–53.
Simons JE, Holmes DT. Reproducible research and reports with R. Journal of Applied Laboratory Medicine. Oxford University Press; 2019;4(3):471–3.
The jamovi Project. jamovi (version 1.0) [computer software] [Internet]. Available from: https://www.jamovi.org.
Ziemann M, Eren Y, El-Osta A. Gene name errors are widespread in the scientific literature. Genome Biology. BioMed Central; 2016;17(1):1–3.