Chapter 24 Putting it all together

In this chapter we provide a list of questions that can be used for for evaluating published studies, and which may also be useful when planning a new study.

24.1 What is the conceptual model of the study?

A good study will have a clearly articulated conceptual model, i.e., be able to state what it is that the intervention is trying to change (the dependent variable), what is being manipulated to bring about that change (the intervention), what is the presumed mechanism of change, and what is the expected time course of change (Horner & Odom, 2014). The conceptual model is key to selecting an appropriate research design. If it is not explicitly stated, it should be possible to identify it from the study introduction and methods.

24.2 What is being compared with what in this study?

Table 20.1 in Chapter 20 provides a simple framework. There are three basic options when evaluating an intervention:
- A. Different people (typically treated or control) are compared at one time point (usually after intervention)
- B. The same person or people are compared at two or more time points (usually before and after intervention)
- C. The same person is compared on different outcome measures, only some of which have been the focus of intervention

Option A corresponds to the classical RCT, and, if properly designed it controls for many biases; see Table 12.1 in Chapter 10. Although the focus is on comparisons after intervention, it is usual to also adjust the score for baseline levels, thereby including some elements of Option B. Option B on its own is usually a weak design because, as shown in Chapter 5, it fails to control for many biases that are related to the simple passage of time, though single case designs that use multiple baseline measures can counteract this.
Option C is sometimes used in single case designs. Our view is that it has potential for improving power and providing converging evidence for intervention effects, especially when combined with A and/or B.

24.3 How adequate are the outcome measures?

Here we return to factors considered in Chapter 3; our impression is that many studies give insufficient attention to outcome measures, which need to be reliable, valid, sensitive, and efficient; instead people tend to use measures that are available and that they are familiar with. There is a need for more systematic studies that compare the suitability of different outcome measures for intervention research, bearing in mind that measures that are useful for assessment and diagnosis may not be optimal as outcome measures.

24.4 How adequate is the sample size?

Unfortunately, many studies are simply too small to demonstrate intervention effects. Enthusiasm to show that an intervention works often propels people into doing studies that may involve considerable time, money and resources, yet are unlikely to show an effect, even if it is really there. If you read a report of an intervention that finds a null result, it is important to consider whether the statistical power was adequate (see Chapter 13).

24.5 Who took part in the study?

As well as sample size, we need to consider whether study participants were representative of the population of interest. One cannot coerce people into taking part, and there will always be a degree of self-selection bias, but it should be possible to tell from background measures and demographic information how representative the sample is of the population of interest. In addition, the report of the study should document how many drop-outs there were, and how this was handled (see Chapter 9).

24.6 Was there adequate control for bias?

Here one wants to know whether steps were taken to avoid biases that might arise if experimenters have a strong desire for an intervention to be effective, which might influence the outcome measures or the write-up of the study. Was selection to intervention/control groups randomized? Could subjective judgments by experimenters have affected the results? Is any conflict of interest declared?

24.7 Was the study pre-registered?

As we discussed in Chapter 22, study registration is widely used in clinical medicine as a method to reduce analytic flexibility that can give misleading results (see Chapter 14). However, we know that, even when a study is preregistered, researchers may depart from the analytic plan, and so it can be illuminating to compare the registration with the published paper.

24.8 Was the data analysis appropriate?

We have not gone into detail regarding technical aspects of data analysis, and there are plenty of useful texts that cover this ground. Even without detailed statistical knowledge, one can ask whether there is evidence of p-hacking (selectively reporting only those results that are ‘significant’), and whether the presentation of the results is clear. Studies that simply report tables of regression coefficients and/or p-values are not very useful for the clinician, who will want to have a more concrete idea of how much change might be associated with an intervention, in order to judge whether it is cost-effective.

24.9 Is the data openly available?

Open data, like preregistration, doesn’t guarantee high quality research, but it is a sign that the researchers are aware of the importance of open, reproducible practices, and it provides an opportunity for others to check results and/or incorporate them in a meta-analysis.

24.10 How do the results compare with others in the literature?

We have emphasized that a single study is never conclusive. One needs to combine information from a range of sources. It is surprisingly common, though, for intervention studies to be written up as if they stand alone - perhaps because many journals put such emphasis on novelty. Trustworthy work will situate results in the context of other studies, and discuss potential explanations for any discrepancies in findings. When comparing studies, we need to move away from a simple binary divide between ‘significant’ and ‘nonsignificant’ findings, to consider whether effect sizes are similar, and if not, why not.

24.11 Summing up

We hope that the material in this book will give readers confidence to scrutinize the intervention literature in this way, and appreciate that it is possible to evaluate important aspects of a study without requiring advanced skills in statistics. Evaluating what works is a key skill for anyone who delivers interventions, so that the most effective approaches can be identified, and future studies can be planned to be rigorous and informative.

References

Horner, R. H., & Odom, S. L. (2014). Constructing single-case research designs: Logic and options. In T. R. Kratochwill & J. R. Levin (Eds.), Single-case intervention Research: Methodological and Statistical Advances (pp. 27–51). American Psychological Association.