1.4 Using software in research

Many people use spreadsheets (such as Microsoft Excel) for analysis of data in research.

Using spreadsheets requires extreme care; many extremely expensive and dangerous errors have been made due to using spreadsheets (AlTarawneh and Thorne 2017), including problems when reporting the 2020 COVID-19 pandemic.

Problems may emerge for many different reasons:

Spreadsheets can be used for research and analysis... but you must be very careful!

Many of the problems with using spreadsheets are due to human error, but spreadsheets make the errors hard to find. Some errors emerge because Excel is being used for purposes it is not really designed for (i.e., scientific analysis).

In this subject, we will usually show output from the statistical software package called R(R Core Team 2018), or other popular statistical software packages such as jamovi (The jamovi Project, n.d.) and SPSS (IBM Corp 2016).

Statistical software packages such as R, jamovi, and SPSS can help us to avoid such problems:

  • They are designed for large data sets
  • They allow for reproducible research
  • They allow for a high level of precision in formatting and data visualisation
  • With a little bit of programming, these software packages can be extremely powerful: with one line of code we can apply a change to an entire data set or part of a data set in an instant
  • They have been designed specifically for the types of statistics and data analysis we will be learning about in this subject.

References

AlTarawneh, Ghada, and Simon Thorne. 2017. “A Pilot Study Exploring Spreadsheet Risk in Scientific Research.” arXiv Preprint arXiv:1703.09785.
Berger, Roger L. 2007. “Nonstandard Operator Precedence in Excel.” Computational Statistics & Data Analysis 51 (6): 2788–91.
Galletta, Dennis F., Kathleen S. Hartzel, Susan E. Johnson, Jimmie L. Joseph, and Sandeep Rustagi. 1996. “Spreadsheet Presentation and Error Detection: An Experimental Study.” Journal of Management Information Systems 13 (3): 45–63.
Hargreaves, Bruce R., and Thomas P. McWilliams. 2010. “Polynomial Trendline Function Flaws in Microsoft Excel.” Computational Statistics & Data Analysis 54 (4): 1190–96.
IBM Corp. 2016. IBM SPSS Statistics for Windows, Version 24.0. Armonk, NY: IBM Corp.
Keeling, Kellie B., and Robert J. Pavur. 2004. “Numerical Accuracy Issues in Using Excel for Simulation Studies.” In Proceedings of the 2004 Winter Simulation Conference, 2004, 2:1513–18. IEEE.
London, R. E., and H. A. Slagter. 2021. “Statement of Retraction: Effects of Transcranial Direct Current Stimulation over Left Dorsolateral pFC on the Attentional Blink Depend on Individual Baseline Performance.” Journal of Cognitive Neuroscience, 1. https://doi.org/https://doi.org/10.1162/jocn_x_01680.
McCullough, B. D., and Berry Wilson. 2002. “On the Accuracy of Statistical Procedures in Microsoft Excel 2000 and Excel XP.” Computational Statistics & Data Analysis 40 (4): 713–21.
Mélard, Guy. 2014. “On the Accuracy of Statistical Procedures in Microsoft Excel 2010.” Computational Statistics 29 (5): 1095–1128.
Panko, Ray. 2016. “What We Don’t Know about Spreadsheet Errors Today: The Facts, Why We Don’t Believe Them, and What We Need to Do.” arXiv Preprint arXiv:1602.02601.
Panko, Raymond R., and Ralph H. Sprague Jr. 1998. “Hitting the Wall: Errors in Developing and Code Inspecting a ‘Simple’ Spreadsheet Model.” Decision Support Systems 22 (4): 337–53.
R Core Team. 2018. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Simons, Janet E., and Daniel T. Holmes. 2019. “Reproducible Research and Reports with R.” Journal of Applied Laboratory Medicine 4 (3): 471–73.
The jamovi Project. n.d. jamovi (Version 1.0) [Computer Software]. https://www.jamovi.org.
Ziemann, Mark, Yotam Eren, and Assam El-Osta. 2016. “Gene Name Errors Are Widespread in the Scientific Literature.” Genome Biology 17 (1): 1–3.