9 Study design limitations

So far, you have learnt to ask a RQ, identify study designs and designs studies. In this chapter, you will learn to identify and describe the limitations of a study. You will learn to:

  • identify when and how study results are internally valid.
  • identify when and how study results are externally valid.
  • identify when and how study results are ecologically valid.

9.1 Introduction

The type of study and how that study is designed can determine how the results of the study should be interpreted. Ideally, a study would be perfectly externally and internally valid; in practice this is very difficult to achieve. Practically every study has limitations. The results of a study should be interpreted in light of these limitations.

Limitations generally can be discussed through three components:

  • Internal validity (Sect. 3.7): Discuss what study design features may compromise the internal validity of the study (such as identifying possible confounding variables). This is related to the effectiveness of the study with the sample (Sect. 9.2).
  • External validity (Sect. 3.8): Discuss how well the sample represents the intended population. This is related to the generalisability of the study to the intended population (Sect. 9.3).
  • Ecological validity: Discuss how well the study methods, materials and context approximate the real situation being studied. This is related to the practicality of the results to real life (Sect. 9.4).

All these issues should be considered when considering the study limitations.

Almost every study has limitations. Identifying potential limitations, and discussing the likely impact they have on the interpretation of the study results, is important and ethical.

Example 9.1 Delarue et al. (2019) discuss studies where subjects rate the taste of new food products. They note that taste-testing studies should (p. 78):

... allow generalizing the conclusions obtained with a consumer sample in one particular study to the general targeted population (this ability is commonly referred to as external validity). In the same time, tests should be reliable in terms of accuracy and replicability (this is commonly referred to as internal validity).

Then they state that, even when studies have good internal and external validity, these studies often result in a 'high rate of failures of new launched products'. That is, the study results often do not translate into the real world, and so lack ecological validity.

9.2 Limitations: internal validity

Internal validity refers to how well the study design isolates the relationship of interest and eliminates all other possible explanations (Sect. 3.7). A discussion of the limitations of internal validity should cover, as appropriate: possible confounding variables; the impact of the carry-over, Hawthorne, observer and placebo effects; the impact of any other design decisions.

If any of these issues are likely to compromise internal validity, the implications on the interpretation of the results should be discussed. For example, if the study design implies that the Hawthorne effect may be an issue (since the participants were not blinded), this should be clearly stated, and the conclusion should indicate that the individuals in the study may have behaved differently than usual because (for example) they knew the were in a study.

Example 9.2 (Study limitations) A study (Axmann et al. 2020) randomly allocated Ugandan farmers to receive, or not receive, hybrid maize seeds. The random allocation is good for internal validity. However, the authors identified one threat to internal validity: farmers recieving the hybrid seeds could share their seeds with their neighbours.

To ensure this was not occurring, the researchers contacted the 75 farmers allocated to receive the hybrid seeds; none of the 30 farmers who could be contacted reported selling or giving seeds to other farmers. This extra step increased the internal validity of the study.

The internal validity of observational studies is often compromised because confounding can be less effectively managed than for experimental studies (e.g., random allocation is not possible). The internal validity of experimental studies involving people is often compromised because people must be informed that they are participating in a study.

Example 9.3 (Internal validity) In a study of the hand-hygiene practices of paramedics (Barr et al. 2017), self-reported hand-hygiene practices were very different than what was reported by peers.

When participants knew they were being studied, their responses made their own behaviours appear better than their colleagues. This is a study limitation that was necessary to discuss.

Example 9.4 (Internal validity) A study evaluated using a new therapy on elderly men, and listed some limitations of their study:

... the researcher was not blinded and had prior knowledge of the research aims, disease status, and intervention. As such, these could all have influenced data recording [...] The potential of reporting bias and observer bias could be reduced by implementing blinding in future studies.

--- Kabata-Pi┼╝uch et al. (2021), p. 10

A study (Botelho et al. 2019) examined the food choices made when subjects were asked to shop for ingredients to make a last-minute meal. Half were told to prepare a 'healthy meal', and the other half told just to prepare a 'meal'. Part of the Discussion stated:

Another limitation is that results report findings from a simulated purchase. As participants did not have to pay for their selection, actual choices could be different. Participants may also have not behaved in their usual manner since they were taking part in a research study, a situation known as the Hawthorne effect.

--- Botelho et al. (2019), p. 436

9.3 Limitations: external validity

External validity refers to the ability to generalise the results to the entire intended population, based on the sample studied (Sect. 3.8). For a study to be externally valid, it must first be internally valid: If the study of not effective in the sample studied (i.e., internally valid), the results may not apply to the intended population either.

External validity refers to how well the sample is likely to represent the target population in the RQ. If the RQ is 'Among Alaskans, what proportion own a smart speaker?', then the study is externally valid if the sample is representative of Alaskans. The results do not have to apply to people in the rest of the United States (though this can be commented on, too). The intended population, in the RQ, is Alaskans.

External validity depends on how the sample was obtained. Results from random samples are likely to generalise to the population and be externally valid. (The analyses in this book assume all samples are simple random samples.) Furthermore, results from approximately representative samples may generalise to the population and be externally valid if those in the study are not obviously different than those not in the study.

Example 9.5 (External validity) A New Zealand study (Gammon et al. 2012) identified (for well-documented reasons) a particular group to study: 'women of South Asian origin living in New Zealand' (p. 21). The women in the sample were 'women of South Asian origin living in New Zealand [...] recruited using a convenience sample method throughout Auckland' (p. 21).

The results may not generalise to the intended population of 'women of South Asian origin living in New Zealand' because all the women in the sample came from only one city (Auckland), and the sample was not a random sample (so the study may not be externally valid).

The results will not generalise to all New Zealand women, but this is not a limitation: the target population was only 'women of South Asian origin living in New Zealand'. The researchers did not intend the results to apply to all New Zealand women.

Example 9.6 (Using biochar) A study of growing ginger using biochar (Farrar et al. 2018) used one farm at Mt Mellum, Australia. While the results may only generalise to growing ginger at Mt Mellum, the encouraging results suggest that a wider, more general, study of the impact of using biochar to grow ginger would be worthwhile. In addition, ginger is usually grown is similar types of climates and soils, so the results may apply to other ginger farms also.

9.4 Limitations: ecological validity

The likely practicality of the study results in the real world should also be discussed. This is called ecological validity.

Definition 9.1 (Ecological validity) A study is ecologically valid if the study methods, materials and context approximate the real situation of interest.

Studies don't need to be ecologically valid to be useful; much can be learnt under special conditions, as long as the potential limitations are understood when applying the results to the real world. Although ecological validity is not essential for a good study, ecological validity is useful if is possible to achieve. The ecological validity of experimental studies may be compromised because the experimental conditions are sometimes artificially controlled (for good reason).

Example 9.7 (Ecological validity) Consider a study to determine the proportion of people that buy a coffee in a reuseable cup. People could be asked about their behaviour. This may not be ecologically valid, as how people act may not align with what they say.

An alternative study could watch people buy coffees at various coffee shops, and record what people do in practice. This second study is more likely to be ecologically valid, as we are watching real-world behaviour.

A study observed the effect of using high-mounted rear brake lights (Kahane and Hertz 1998), which are now commonplace. The American study showed that such lights reduced rear-end collisions by about 50%. However, after making these lights mandatory, rear-end collisions reduced by only 5%. Why?

9.5 Study designs and limitations

Experimental studies, in general, have higher internal validity than observational studies, since more of the study design in under the control of the researchers; for example, random allocation of treatments is possible to minimise confounding.

However, experimental studies may suffer from poor ecological validity; for instance, laboratory experiments are often conducted conducted under controlled temperature and humidity. Many experiments also require that people be told about being in a study (due to ethical requirements), and so internal validity may be comprised due to the Hawthorne effect.

Example 9.8 (Retrofitting) In a study of retro-fitting house with energy-saving devices, Giandomenico, Papineau, and Rivers (2022) found large discrepancies between saving when observational (cross-sectional) studies were used (12.2%) and when experimental (randomised controlled trial) studies were used (6.2%). The authors say that 'this finding reinforces the importance of using study designs with high internal validity to evaluate program savings' (p. 692).

9.6 Summary

The limitations in a study need to be identified, and may be related to:

  • internal validity (effectiveness): how well the study is conducted with the sample, isolating the relationship of interest.
  • external validity (generalisability): how well the sample results are likely to apply to the intended population.
  • ecological validity (practicality): how well the results may apply to the real-world situation.

9.7 Quick review questions

Are the following statements true or false?

  1. When interpreting the results of studies, the steps taken to maximize internal validity should be considered.
  2. If studies are not externally valid, then they are not very useful.
  3. When interpreting the results of studies, the steps taken to maximize external validity do not need to be considered.
  4. When interpreting the results of studies, ecological validity is about the impact of the study on the environment.

9.8 Exercises

Selected answers are available in Sect. D.9.

Exercise 9.1 A research study examined how people can save energy through lighting choices (Gentile 2022). The study states (p. 9) that 'are limited to the specific study and cannot be easily projected to other similar settings'.

What type of validity is being discussed here?

Exercise 9.2 In a study evaluating the study of farm managament practices, Kluger, Owen, and Lobell (2022) state (p. 2) stated:

... conclusions may not apply to farms with different soil, climate, or management conditions than those of the experimental site.

Later, they say (p. 2) that a

... the estimates for the causal effect based on the observational study are biased and do not properly estimate the causal effect in the study sample.

What types of validity is being discussed in these two extracts?

Exercise 9.3 When interpreting the results of studies, we consider the practicality ( validity), the generalizability ( validity) and the effectiveness ( validity).

Internal validity refers to issues such as and the Hawthorne effect.

External validity refers to methods.

Exercise 9.4 A student project at the university where I work asked the RQ:

Among university students on-campus, is the percentage of word retention higher in male students than female students?

When discussing external validity, the students stated:

We cannot say whether or not that the general public have better or worse word retention compared to the students that we will be studying.

Why is the statement not relevant?

Exercise 9.5 Despite their common use, no experimental scientific evidence shows that parachutes are effective (Smith and Pell 2003). To obtain evidence, researchers studied this scenario (Yeh et al. 2018). Part of the Abstract for the paper (slightly edited for clarity) says:

Objective To determine if using a parachute prevents death or major traumatic injury when jumping from an aircraft.

Design Randomized controlled trial.

Setting Private or commercial aircraft between September 2017 and August 2018.

Participants 92 aircraft passengers aged 18 and over were screened for participation. 23 agreed to be enrolled and were randomized [into th etwo groups].

Intervention Jumping from an aircraft (airplane or helicopter) with a parachute versus an empty backpack (unblinded).

Main outcome measures Composite of death or major traumatic injury (defined by an Injury Severity Score over 15) upon impact with the ground measured immediately after landing.

Results Parachute use did not significantly reduce death or major injury (0% for parachute v 0% for control).

Conclusions Parachute use did not reduce death or major traumatic injury when jumping from aircraft in the first randomized evaluation of this intervention. However, the trial was only able to enroll participants on small stationary aircraft on the ground, suggesting cautious extrapolation to high altitude jumps [...]

--- Yeh et al. (2018)

Based on this information:

  1. Carefully define POCI.
  2. What type of study is this: observational or experimental?
  3. What are the variables?
  4. Comment on the ecological validity of this study.
  5. Comment on the limitations of the study.
  6. What are the conclusions?

Exercise 9.6 A study of how well hospital patients sleep at night (Delaney et al. 2018) set out to 'investigate the perceived duration and quality of patient sleep' (p. 1). In discussing the study, the researchers state (p. 2):

Patients and nursing staff were recruited for this study. Non-probability convenience sampling was used to recruit patients to participate...

Later, while discussing the limitations, the researchers state (p. 7):

While the multiple methods of data collection and inclusion of 15 clinical areas are strengths of this study, the results may not be generalisable to all hospitals or all ward areas [...] while most healthy individuals sleep primarily or exclusively at night, it is important to consider that patients requiring hospitalization will likely require some daytime nap periods. This study looks at sleep only in the night-time period 22:00--07:00h, without the context of daytime sleep considered.

Discuss these issues using the language introduced in this chapter.