1.4 Using software in research
Many people use spreadsheets (such as Microsoft Excel) for analysis of data in research.
Using spreadsheets requires extreme care; many extremely expensive and dangerous errors have been made due to using spreadsheets (AlTarawneh and Thorne 2017), including problems when reporting the 2020 COVID-19 pandemic.
Problems may emerge for many different reasons:
- Spreadsheets can automatically change the entered data (for example, reformatting entries as dates if the spreadsheet thinks the data should be a date), even when not appropriate. This has had dire consequences (Ziemann et al. 2016).
- Spreadsheets may include formulas with errors (Panko and Sprague Jr 1998), that are incredibly difficult to locate and hence fix (Galletta et al. 1996; Panko 2016; London and Slagter 2021).
- Spreadsheets do not leave a record of how the data have been analysed or prepared; for example, formulas can be very difficult to understand and parse. Keeping a record of the analysis, preparation of variables, and other operations with the data are part of what is called reproducible research (Simons and Holmes 2019). Reproducibility ensures, among other advantages, that the results can be checked by the researchers and by others.
- Excel has bugs (Keeling and Pavur 2004; Mélard 2014) even in very basic operations (Berger 2007; Hargreaves and McWilliams 2010). After trying to fix these bugs, sometimes they are made even worse (McCullough and Wilson 2002).
Spreadsheets can be used for research and analysis… but you must be very careful!
Many of the problems with using spreadsheets are due to human error, but spreadsheets make the errors hard to find. Some errors emerge because Excel is being used for purposes it is not really designed for (i.e., scientific analysis).