Chapter 3 Modelling

This figure sets out the principle that as models increase in business risk and complexity, the level of appropriate quality assurance also increases. This is similar to the idea that we can achieving different levels of reproducible analysis, which is explored in the next chapter.

3.1 AQUA book quality assurance principles for code

The quality assurance principles from the AQUA book apply to all models, whether they are built in Excel or an open source language. The main difference is that new software practices allow analysts to enhance the quality assurance of models more makes the process of peer review easier and more efficient. It is also easy using Git to push the code used to produce models to external platforms such as GitHub allowing the model code to be publicity available. This enhancing the transparency of government and would allow analysts or informed citizens to contribute by finding error or bugs or suggesting new approaches. This is also a powerful tool for spreading best practice as analysts would be able to look at effective models from across government rather than keep them siloed in departments.

Git offers an platform for effective peer review whether from others in the team but also from analysts in other teams or departments. Making peer review easier to preform is an effective tool for increasing overall QA. Git also tracks changes made to code and who made them, and includes the decisions behind the changes. This ensures that models have a full audit trail built in without analysts having to invest resources documenting it.

Ensuring that a model is producing the correct output is a time intensive process of analytical testing. This process can be automated using software Testing frameworks. This enhances the robustness of models and increases confidence that the output is a result of the data not the process.

3.2 Repeatable

The AQUA Book states that analysis should be repeatable to be considered valid. Using open source languages effectively should increase the repeatability of analysis when compared with existing spreadsheet methods. The latter often require many manual steps which can be hard to repeat. Using scripted languages means that the entire process has been written down and can be exactly repeated. While it is possible to implement some of these ideas in proprietary tools such as SPSS or SAS, the proprietary nature of the software languages means a license is needed to use them, and everybody who needs to be involved in quality assurance must have one. This may work internally, but if algorithms are published and independently audited, the audience will also need access to the proprietary tools. This is not always possible.

3.3 RAP Levels align with AQUA Levels

work in progess