8 Internal validity and observational studies

So far, you have learnt to ask a RQ, identify different ways of obtaining data, and design the study.

In this chapter, you will learn how to ensure that the conclusions we can make are logical and sound in observational studies. You will learn to:

  • maximise the internal validity of observational studies.
  • manage confounding in observational studies.
  • explain, identify and manage the carry-over effect in observational studies.
  • explain, identify and manage the Hawthorne effect in observational studies.
  • explain, identify and manage the placebo effect in observational studies.
  • explain, identify and manage the observer effect in observational studies.

8.1 Introduction

In experimental studies, many aspects of the study design typically can be controlled by the researcher, so experimental studies are often easier to design to maximise internal validity. In contrast, observational studies have fewer design features that can be controlled by the researchers.

For example, treatments are not imposed in observational studies, so random allocation of treatments is impossible, and hence confounding is always a potential threat to internal validity in observational studies.

Nonetheless, researchers should still consider aspects of research design when designing observational studies, and manage those aspects when possible to maximise the internal validity. Specific design strategies that we consider for maximising internally validity are:

Not every design consideration will be relevant to every study.

Remember the goal of study design: To design a study to isolate the relationship of interest, by eliminating, as best as possible, all other possible explanations.

Example 8.1 (Low internal validity) The authors of a study190 teaching integrated pest management (IPM) to Ugandan small-scale farmers noted that

The neighboring farmers were supposed to represent the average farmer living in the intervention subdistrict after IPM training from FFS [i.e., Farmer field schools] farmers. They were, however, different from the control farmers according to some demographic and agricultural characteristics---e.g., more neighboring farmers lived in Pallisa, were women, had a higher education and were members of a farmers' group, which were all possible confounders in the study [...]

[...] Despite the low internal validity, this study showed some results supporting that IPM through FFSs can be used as a tool to reduce occupational health hazards and environmental pollution from pesticides in developing countries.

--- Clausen et al.191, p. 8, 9

Despite low internal validity, the study showed some promising signs of teaching IPM that need to be explored in future studies.

Richard Doll and A. Bradford Hill192 wrote to a large number of British doctors, and asked how much they smoked. Then they observed smokers and non-smokers for many years, and recorded who died of lung cancer.

Why is this is an observational study?

Because the value of the explanatory variable ('whether or not the doctor smoked') is not determined by, or can be manipulated by, the researcher. That would be unethical.

8.2 Managing confounding

In Sect. 7.2 different methods were listed for managing confounding in experimental studies. Some, but not all, of these are still possible in observational studies:

  • Restricting the study to a certain group (for example, only people under 30).
  • Blocking. Analyse the data separately for different groups (for example, analyse the data separately for people under 30, and 30 and over).
  • Analysing using special methods (after measuring the age of each subject).
  • Randomly allocating people to groups: Not possible in observational studies.

8.2.1 Restrictions

As with experimental studies, observational studies can be restricted to certain parts of the population. For example, in the smoking study of Doll and Hill193, participants were restricted to males aged under 35 years since, at the time of the study:

... lung cancer is relatively uncommon in women and rare in men under 35 [and] useful figures are unlikely to be obtained in these groups for some years to come.

--- Doll and Hill194, p. 1452.

The reason for the restriction should be justified if possible (as shown in the quote above).

8.2.2 Blocking

Blocking can be used with observational studies; for example, those aged under 30 and those aged 30 or over could be analysed separately.

This, of course, requires the age of the participants to be available to the researchers.

8.2.3 Analysis

The best advice for observational studies is to measure, observe, assess or record all the information that is likely to be important for understanding the data. While this strategy is also useful for experimental studies, it is particularly important for observational studies, as managing confounding through analysis (Sect. 7.2.3) is often one of the few practical means available.

Measure, observe, assess or record all the information that is likely to be important for understanding the data. This may include information about

  • the individuals in the study; and
  • the circumstances of the study.

Example 8.2 (Analysis) In a different smoking study, Richard Doll and A. Bradford Hill195 recorded the social class and place of residence of each participant, as potential confounding variables.

Example 8.3 (Confounding) An observational study of 2599 kiwifruit orchards196 explored the relationship between the time since a bacterial canker was first detected (in weeks) and the orchard productivity (in tray equivalents per hectare).

The researchers also recorded information such as 'whether or not the farm was organic', 'elevation of the orchard' and 'whether or not general fungicides were used'.

They used these variables in their analysis to manage the potential effects of confounding. Their analysis showed that 'elevation of the orchard' and 'whether or not general fungicides were used' were important confounding variables, but 'whether or not the farm was organic' was not.

8.2.4 Random allocation

In observational studies, the study conditions are not allocated by the researchers (at random or otherwise), so random allocation of treatments is not possible

For this reason, confounding is often a major threat to internal validity in observational studies, as individuals who are in one group may be different to those who are in another group (Table 8.1). As a result, researchers often summarise the groups being compared on various potential confounding variables.

Example 8.4 (Comparing study groups) In an observational study comparing the iron levels of active and sedentary women aged 18 to 35,197 the authors compared the active women (\(n=28\)) to the sedentary women (\(n=28\)) on a variety of characteristics.

However, maybe the intrinsic physical differences between the women in the two groups might explain any differences found between iron levels in two two groups.

To determine this, the researchers examined many characteristics of the women; some are shown in Table 8.1.

They conclude that active women in their sample tended to be (in general) slightly younger, slightly heavier and taller, and slightly more likely to use hormonal contraceptives. Hence, any difference in iron levels between the two groups may be because of the active/sedentary nature of the groups, or because the active group was (in general) younger, for example.

TABLE 8.1: The demographic information for those in the study of iron levels in women
Characteristic Active women Sedentary women
Average age (in years) 20 24
Average height (in cm) 169 166
Average weight (in kg) 68 62
Percentage using hormonal contraceptives 13 11

In the smoking study of Doll and Hill198, doctors who chose to smoke may be inclined to undertake other risky behaviours, whereas those doctors who choose not to smoke may also be inclined to not undertake other risky behaviours. It may be those other risky behaviours that lead to lung cancer, and not the smoking itself.

In a different smoking study, Doll and Hill199 used a control group. The control group was chosen to be very similar to those in the lung-cancer group, in terms of age and sex. (That is, the numbers of females and males within each age group was very similar for those with lung cancer, and those without lung cancer.)

Observational studies can (and often do) have control groups. Indeed, one specific type of observational study is called a case-control study.

However, individuals are not allocated to the control group by the researchers in observational studies.

A study200 examined the difference between two types of helicopter transfer (physician-staffed; non-physician-staffed) of patients with a specific type of myocardial infarction (STEMI). The purpose of the study was:

...to evaluate the characteristics and outcomes of physician-staffed HEMS (Physician-HEMS) versus non-physician-staffed (Standard-HEMS) in patients with STEMI.

--- Gunnarsson et al.201, p. 1

The researchers

...studied 398 STEMI patients transferred by either Physician-HEMS (\(n=327\)) or Standard-HEMS (\(n=71\)) for [...] intervention at 2 hospitals between 2006 and 2014.

--- Gunnarsson et al.202, p. 1

Since the study is an observational study (patients were not allocated by the researchers to the type of helicopter transport), the researchers recorded information about the patients being transported. They compared the patients in both groups, and found (for example) that both groups had similar average ages, a similar percentage of females, a similar percentage of smokers, and so on. They also compared information about the transportation, and found (for example) that both groups had similar average flight times and flight distances.

One conclusion from the study was that 'Patients with STEMI transported by Standard-HEMS had longer transport times' (p. 1), but one limitation of the study was that:

The patient cohorts received treatment by 2 different care teams at two hospitals, which is a potential confounder despite similar baseline characteristics

--- Gunnarsson et al.203, p. 5

In other words, the difference between hospitals and the staff may have been a confounding variable.

8.3 Carry-over effect and washout periods

The carry-over effect is a possible compromise to internal validity in observational studies. However, since treatments are not allocated in observational studies, carry-over effects may be difficult to prevent.

It may be possible, however, to observe individuals who are exposed to Condition A then Condition B, and other individuals who are exposed to Condition B and then Condition A.

Example 8.5 (Carry-over effects) A study of the carry-over effect in ecological observational studies gave many examples, including:

individuals occupying poor quality winter habitat may experience reduced reproductive success the following breeding season when compared to individuals occupying high quality winter habitat.

--- D. Ryan Norris204, p. 181

8.4 Hawthorne effect and blinding individuals

In observational studies, individuals may or may not know they are being observed. For example, in an observational study where subjects' blood pressure is measured,205 subjects clearly will know that they are being observed. This has the potential to alter how the subjects behave (for example, people become more tense, called 'white-coat hypertension'; Thomas G. Pickering, William Gerin, and Amy R. Schwartz206).

As with experimental studies, efforts should be made to ensure that individuals do not know that they are being observed (that is, that the participants are blinded).

Example 8.6 (Hawthorne effect) One study207 examined hand hygiene (HH) in a tertiary teaching hospital, using:

  • covert observers (that is, the observers were not obviously watching the hand hygiene practices of staff); and
  • overt observers (that is, the observers were obvious about watching the hand hygiene practices of staff).

One conclusions was that

The overall HH compliance was higher with overt observation than with covert observation (78% vs. 55%)...

--- Wu et al.208, p. 369

In other words, people's behaviour changed markedly when people knew they were being observed. This could easily change the observed relationship between the response and explanatory variables, and hence compromise internal validity.

8.5 Placebo effect and using controls

The placebo effect is concerned with treatments, so are not directly relevant to observational studies.

However, observational studies can still have a control group, but the individuals are not randomly allocated to the control group.

For example, in the Doll & Hill smoking study,209 two groups were being compared: non-smokers (the control group) and smokers.

Subjects were not allocated to the groups, however, so confounding remains a possibility. Again, the groups in the study can be compared (Example 8.4) to see if the groups are different in other ways.

8.6 Observer effect and blinding researchers

The observer effect can be an issue in observational as well as experimental studies. For example, consider a study where the blood pressure of smokers and non-smokers is recorded.210

This is an observational study (individuals cannot be allocated to be a smoker or non-smoker), but if the researchers know whether or not the individual is a smoker when they record the blood pressure, then the observer effect could still come into play (recalling that the observer effect is an unconscious effect). For example, the researchers may expect that smokers have a high blood pressure.

In this example, the observer effect could be managed if the researchers first measured the blood pressure, and then asked if the individual was a smoker or not. That is, the researchers may be able to be blinded to whether or not the subject is a smoker when they take blood pressure measurements.

This may only be partially successful; the researcher may see the subject carrying a packet of cigarettes, or can smell smoke on their breath, for example; nonetheless, it may prove at least partially successful, and is easy to implement.

Example 8.7 (Observer effect) In a study of animal moulting,211 researchers took photos of snowshoe hares, at various stages of moulting and in various environmental conditions. Eighteen independent observers were asked to rate the moult stage from the photographs.

The article state that:

All images were randomly named and sorted, with the dates and manufacturer's logos removed to minimize observer expectancy bias [i.e., the observer effect].

--- Zimova et al.212, p. 4

Example 8.8 (Blinding in ecology) A study of research articles in ecology found that:

Across all 492 EEB articles surveyed, we judged 50.4% (\(n = 248\)) to have potential for observer bias, but only 13.3% (\(n = 33\) of \(248\)) of these articles stated use of blind observation.

Some articles explicitly stated the use of blind observation in the methods (\(n = 24\)), while others indicated indirectly that experiments had been done blind (\(n = 9\); e.g., use of a naive experimenter...).

--- Melissa R. Kardish et al.213; line breaks added

Blinding the observer is not always possible, but should be used when possible to improve the internal validity of the study.

Example 8.9 (Observer effect) A study of the scats of gray wolves noted that:

The most widely used method to determine diets of carnivores is scat analysis...

-- Rick Spaulding, Paul R. Krausman, and Warren B. Ballard214, p. 947

A scat analysis is where humans examine the scat of carnivores, and hence determine the prey. However, the accuracy of the results was questioned, due to

... improper training, intentional incomplete analysis, [...] and perpetuation of the assumption that wolf scats contain only 1 prey item/scat...

--- Spaulding, Krausman, and Ballard215, p. 949

That is, the observers might be seeing what they expect to see: that "wolf scats contain only 1 prey item/scat".

Example 8.10 (Observer effect) A study of using "citizen science" Maiju Lehtiniemi, Okko Outinen, and Riikka Puntila-Dodd216 for monitoring coastal non-indigenous species stated that:

Other limitations of citizen produced data arise from observer bias; the search effort differs between observers, observers may be geographically unevenly distributed and can fail to detect the species present [...] Furthermore, the observers might have tendencies to report only species of interest, instead of reporting all species detected...

--- Lehtiniemi, Outinen, and Puntila-Dodd217, p. 110608

Example 8.11 (Blinding in ecology) A study of research articles in ecology found:

Across all 492 EEB articles surveyed, we judged 50.4% (\(n = 248\)) to have potential for observer bias, but only 13.3% (\(n = 33\) of \(248\)) of these articles stated use of blind observation.

Some articles explicitly stated the use of blind observation in the methods (\(n = 24\)), while others indicated indirectly that experiments had been done blind (\(n = 9\); e.g., use of a naïve experimenter...).

--- Kardish et al.218; line breaks added

Blinding the observer is not always possible, but should be used when possible to improve the internal validity of the study.

Example 8.12 (Blinding in ecology) A study219 found that bicycle riders who wear helmets are more likely to take risks compared to bicycle riders who do not wear helmets.

The paper states that the bicycle riders were blinded to the purpose of the study (reducing the impact of the Hawthorne effect), though clearly the participants knew they were involved in a study (so the impact was not completely eliminated).

However, the study was criticised,220 since it was possible that

... the experimenters unconsciously conveyed their expectations to participants and thereby affected their responses [...] it is clear that the double-blind procedure has been developed for a reason and should have been used in this study.

--- Radun and Lajunen221, p. 1020

The lack of blinding, when it was possible to incorporate blinding, compromised the study's internal validity.

8.7 Comments on blinding

Many of the comments about blinding in Sect. 7.7 apply for observational studies also.

In observational studies, blinding individuals may be (but is not always) easier than in experimental studies (Sect. 8.4).

Blinding the researchers may be difficult, since the researchers need to record the value of the explanatory variable. To blind the researchers, sometimes two different researchers can be used: One to record the value of the response variable and one to record the value of the explanatory variable.

Example 8.13 (Blinding in observational studies) In a study of dogs with chronic pancreatitis (CP), the researchers acquired abdominal ultrasounds and pathology results of each dog. The authors report that the researchers conducting the ultrasounds were not blinded:

The ultrasonographers in this study were not blinded and may have been biased by clinical assumptions...

--- P. J. Watson et al.222, p. 969


Tissue samples [...] were obtained from all dogs and re-cut sections were reviewed by one of the authors (PJW) with the help of a veterinary pathologist (AJR or TJS), all blinded to the clinical details of the case, but aware that CP was suspected in each dog...

--- Watson et al.223

This study clearly explains which parts of the study were blinded, and which were not.

Example 8.14 (Blinding in observational studies) A study of Achilles tendinopathy in gymnasts224 compared 40 elite gymnasts with 41 controls of similar non-gymnasts.

Although the primary investigator was blind to the clinical status of the subjects, there was no blinding to whether each subject was in the gymnast or control group during image collection [...] However the examiner was blinded to both the clinical state and group of each subject when the images were reviewed.

--- Emerson et al.225, p. 38

8.8 Summary

Designing effective observational studies requires researchers to maximise internal validity. This can be achieved by managing confounding where possible, as confounding is often a major threat to the internal validity of observational studies.

Confounding can be managed by:

  • restricting the study to certain groups;
  • blocking; or
  • through special analysis methods.

Random allocation is not possible in observational studies. For this reason, observing, measuring, assessing or recording all the information that is likely to be important for understanding the data is important, usually to be used in analysis.

Well-designed observational studies also try to manage

  • the carry-over effect;
  • the Hawthorne effect;
  • the placebo effect; and
  • observer effect

though the means of doing so are often not under the control of the researchers.

8.9 Quick review questions

Formwork is used in construction using reinforced concrete. It is complicated and labour intensive.

An observational study226 examined the relationship between the floor area of the building (in m2 per storey) and the number of hours of labour needed for the construction (in person-minutes per storey).

The researchers also recorded, among other things:

  • the average age of the workers (in years);
  • the average years of experience of the workers (in years); and
  • the storey height (in meters)

for each of \(n=15\) multi-storey buildings in the study. Some data was obtained from

... interviews [...] conducted with the relevant person-in-charge of scheduling for each project.

--- Mine et al.227, p. 2

To record the number of person-hours of labour:

Two observers made the site rounds observation of 2 to 40 workers. Observations were carried out continuously from the start to the end of work, excluding lunch time.

--- Mine et al.228, p. 2

  1. The explanatory variable is
  2. The response variable is
  3. What is the best description for the variable 'Average age of the workers'?
  4. What is the most likely way in which confounding would be managed in this study?
  5. True or false: The carry-over effect is likely to be a big problem in this study.
  6. True or false: The Hawthorne effect is likely to be a big problem in this study.
  7. True or false: The placebo effect is likely to be a big problem in this study.
  8. True or false: Observer bias is likely to be a big problem in this study.

  1. Which of the following statements are true?

    • Observational studies cannot have a control group.
    • Only experimental studies can use random allocation to avoid confounding
    • Only observational studies can manage the observer effect
    • Only experimental studies can use random sampling

A study compared the average amount of pollen returned to the hive per bee, by two types of native Australian bees: yellow & black carpenter bees, and green carpenter bees. In the study, the researchers also recorded the size of the hive, among other things.
Why did they do this?


8.10 Exercises

Selected answers are available in Sect. D.8.

Exercise 8.1 Consider this RQ (based on Teillet et al.229):

Among university students, is the taste of tap water different than the taste of bottled water?

You want to answer this question using an observational study. Describe what these might look like for this study:

  1. Random allocation.
  2. Blinding.
  3. Double blinding.
  4. Control.
  5. Finding a random sample.

Exercise 8.2 Is it possible to have a control group in an observational study? Explain.

Exercise 8.3 Is the Hawthorne effect only a (potential) issue in experimental studies? Explain.

Exercise 8.4 A study of how well hospital patients sleep at night230 had the stated aim 'to investigate the perceived duration and quality of patient sleep [...] in hospital'. In discussing the limitations of the study, the researchers state:

The researchers made no attempt to deceive clinical staff regarding the nature of the study so the influence of the Hawthorne Effect should be considered. The presence of the observer and environmental monitoring equipment in the clinical environment could have altered behaviour among patients and nursing staff seeking to conform to the presumed research objectives. As a result, the findings reported may be an underestimation of the magnitude of the issues that affect sleep.

--- Delaney et al.231 p. 7

Discuss these limitations in terms of the language used in this chapter.