7 Internal validity

So far, you have learnt to ask a RQ, select a study type, and select a sample.

In this chapter, you will learn about internally validity for experimental studies. You will learn to:

  • maximise the internal validity of studies.
  • manage confounding in studies.
  • explain, identify and manage the Hawthorne, observer, placebo and carry-over effect in studies.
  • explain different types of blinding.

7.1 Introduction

A well-designed study is needed to draw solid conclusions: a study with high high external validity (Sect. 3.1) and high internal validity (Sect. 3.2). Some research design decisions to maximise internal validity are discussed in this chapter.

Example 7.1 (Importance of internal validity) Beaman et al. (2013) describe an experiment where free fertilizer was provided to a sample of female farmers in Mali (at the recommended rate, or at half the recommended rate).

The farmers knew they were part of a study, so changed their farm management: they employed more hired labour and used more herbicide. Consequently, the yields for all farmers improved. Knowing if changes in yield were the result of applying the fertilizer is difficult, as the study had poor internal validity.

Specific design strategies for maximising internal validity include:

  • managing confounding (Sect. 7.2).
  • managing the Hawthorne effect by blinding individuals (Sect. 7.3).
  • managing the observer effect by blinding the researchers (Sect. 7.4).
  • managing the placebo effect by using controls, objective measures and blinding (Sect. 7.5).
  • managing the carry-over effect by using washouts (Sect. 7.6).

Not all of these strategies will be relevant to every study.

7.2 Managing confounding

Example 7.2 (Himalaya study) Consider this relational RQ (based on Bird et al. (2008)) with an intervention:

Among Australians, is the average faecal weight the same for people eating provided food made from wholegrain Himalaya 292 compared to eating provided food made from refined cereal?

Suppose that the researchers created two groups of individuals for this experimental study:

  • Group A: women recruited from a female-only gym.
  • Group B: men recruited from a local nursing home.

The researchers gave Himalaya 292 to Group A, and the refined cereal to Group B. If a difference in faecal weight was detected between the two groups, the difference may due to:

  • the different diets (the explanatory variable) for each group;
  • the different sexes in each groups (Group A was all women; Group B was all men);
  • the different ages in each group (Group A is likely to be younger on average than those in Group B); or
  • the different overall health in each group (Group A would generally be healthier than those in Group B).

Any difference in faecal weight detected between the two groups may not be due to the diets (Table 7.1): the study has very poor internal validity, due to poor research design.

Sex, age and overall health are confounding variables (Def. 3.6), as they are associated with the type of diet (the explanatory variable) and faecal weight (the response variable). For example, the age of the subject may be associated with faecal weight (older people tend to eat less, and eat differently, than younger people), and the research design means that older people are more likely to be consuming the refined cereal. This is an extreme case of confounding (Fig. 7.1); usually, confounding is more subtle (and more difficult to detect) than in this example.

TABLE 7.1: Comparing Groups A and B: an extreme example of confounding.


Group A Group B
Women Sex Men
Younger (in general) Age Older (in general)
Himalaya 292 Cereal Refined
Very fit (in general) Fitness Less fit (in general)


An extreme example of confounding.

FIGURE 7.1: An extreme example of confounding.

The groups being compared should be as similar as possible, apart from the difference being studied.

Since the groups being compared should be as similar as possible, apart from what is being studied, researchers often compare the comparison groups on potential confounding variables.

In experimental studies, an excellent way to manage confounding is:

  1. Randomly allocating individuals to the comparison groups.

Random allocation should ensure that the values of potential confounding variables are approximately evenly spread between the comparison groups. This is true for identified potential confounders (such as age), and also for variables not even considered as confounders, or are hard to measure or observe (such as genetic conditions).

Randomly allocating individuals to comparison groups is not possible in observational or quasi-experimental studies. For this reason, confounding is often a major threat to internal validity in these studies, as individuals who are in one comparison group may be different, in general, to those who are in another group. Fortunately, other (less effective) means for managing confounding also exist:

  1. Restricting the study to a certain group, by keeping some variables approximately constant. These variables a type of control variable (Def. 3.5). If possible, a reason for this restriction should be given.
  2. Blocking, when units of analysis are arranged into different groups containing individuals that are similar to each another (see Sect. 33.1 for an example).

Definition 7.1 (Blocking) Blocking occurs when units of analysis are arranged or analysed as separate groups of similar units (called blocks).

  1. Analysing using special methods (beyond this book), after recording the values of potential confounding variables. Because of this, recording all potential extraneous variables is important. Most studies involving people record the participants' age and sex if possible, as these two variables are common confounders. Once a sample is obtained, recording this extra information usually requires little extra effort.

Restricting and blocking are useful if one or two variables are known, or thought likely, to cause confounding. Multiple approaches can be used, such as randomly allocating individuals to groups, and recording other variables that can be managed through analysis.

Randomly allocating is superior when possible, because confounding is reduced for variables not even suspected as being confounders. Hence, experimental studies should use random allocation whenever possible.

For any study (but especially for observational and quasi-experimental studies), recording the values of any potential confounding variables is useful, so special analysis methods can be used to manage confounding.

Record all the extraneous variables likely to be important for understanding the data (Sect. 7.8). This may include information about the individuals in the study, and the circumstances of the individuals in the study (that is, the circumstances the individuals find themselves in; these may not be measured on the individuals themselves).

Common to many of these methods is to ensure that any potential confounding variables are recorded (Sect. 7.8), to ensure no lurking variables exist that may compromise the internal validity of the results.

Example 7.3 (Managing confounding: experimental study) For the Himalaya study, different methods can be used to manage confounding due to age.

The study could be restricted to people under \(30\). Age would be a control variable.

Blocking could be used by finding similar pairs of subjects (e.g., pairs of subjects of the same sex, with similar age and weight). One of each pair is given the refined cereal diet, and one given the Himalaya 292 diet. The differences in faecal weight for each pair can be analysed using special methods (see Chap. 33 for example).

Information about the individuals could be recorded, such as age and pre-study weight. Information about the circumstances of the individuals could also be recorded, such as where they live. Then, special methods of analysis could be used to analyse the data.

Since the study is experimental, participants could be randomly allocated into one of two groups, so both groups would have a similar spreads of ages (and other potential confounders). Then groups could be randomly allocated to receive one of the diets (Fig. 7.2).

In the Himalaya 292 study, individuals were randomly allocated to the diets (p. 1033), which manages confounding due to age and other potential confounding variables also.

Random allocation can occur in two places for the Himalaya study.

FIGURE 7.2: Random allocation can occur in two places for the Himalaya study.

An experiment to study the effect of using ginko to enhance memory (Solomon et al. 2002) compared two groups: one using ginko (\(n = 111\)), and one using a fake, non-active supplement (\(n = 108\)). The authors randomly allocated participants to each group, then compared the two groups to ensure that no obvious differences initially existed between the groups that might explain differences in the response variable (Table 7.2).

Two groups are similar in terms of age, education and gender distribution. Any difference in outcome between the groups is probably due to the treatment.

TABLE 7.2: Comparing the two groups in the ginko-memory study.
Characteristic Group A (Ginko) Group B (Fake)
Average age (in years) 68.7 69.9
Men (number; percentage) 46 (41%) 45 (42%)
Average years of education 14.4 14.0

Researchers explored the use of dominant and non-dominant hands for chest compression in student paramedics using an experimental study (Cross et al. 2019). Students were randomly divided into two groups: DHOS (dominant hand on chest) and NDHOC (non-dominant hand on chest). The two groups were then compared:

Demographic All participants (\(n = 75\)) DHOC (\(n = 37\)) NDHOC (\(n = 38\))
Average age (years) \(23.4\) \(22.5\) \(24.3\)
Gender: percentage Female \(51\)% \(53\)% \(47\)%

The two groups appear to be very similar in terms of average age of participants, and the percentage of female participants. If differences are observed in the study between the DHOC and NDHOC groups, it is probably due to the treatment. The study should have reasonable internal validity.

Example 7.4 (Managing confounding: observational study) Froud, Beresford, and Cogger (2018) studied \(2\ 599\) kiwifruit orchards using an observational study, exploring the relationship between the time since a bacterial canker was first detected (in weeks) as the explanatory variable, and the orchard productivity (in tray equivalents per hectare) as the response variable.

The researchers also recorded extraneous variables such as 'whether or not the farm was organic', 'elevation of the orchard' and 'whether or not general fungicides were used'. These variables were used in their analysis to manage the potential effects of confounding.

Example 7.5 (Comparing study groups: observational study) An observational study compared the iron levels of active and sedentary women aged \(18\) to \(35\) (Woolf et al. 2009). The active women (\(n = 28\)) and sedentary women (\(n = 28\)) were compared on a variety of characteristics (Table 7.3). The active women were similar to the sedentary women on these characteristics, but were (in general) slightly younger, slightly heavier, and slightly more likely to use hormonal contraceptives.

TABLE 7.3: The demographic information for those in the study of iron levels in women.
Characteristic Active women Sedentary women
Average age (in years) \(20\) \(24\)
Average weight (in kg) \(68\) \(62\)
Percentage using hormonal contraceptives \(13\) \(11\)

A study (Gunnarsson et al. 2017) examined the difference between two types of helicopter transfer (physician-staffed; non-physician-staffed) of patients with a specific type of myocardial infarction (STEMI). The purpose of the study was:

...to evaluate the characteristics and outcomes of physician-staffed HEMS (Physician-HEMS) versus non-physician-staffed (Standard-HEMS) in patients with STEMI.

--- Gunnarsson et al. (2017), p. 1

The researchers

...studied \(398\) STEMI patients transferred by either Physician-HEMS (\(n = 327\)) or Standard-HEMS (\(n = 71\)) for [...] intervention at \(2\) hospitals between 2006 and 2014.

--- Gunnarsson et al. (2017), p. 1

Since the study is an observational study (patients were not allocated by the researchers to the type of helicopter transport), the researchers recorded information about the patients being transported. They compared the patients in both groups, and found (for example) that both groups had similar average ages, and similar percentages of females and smokers, and so on. They also compared information about the transportation, and found (for example) that both groups had similar average flight times and flight distances.

One conclusion from the study was that 'Patients with STEMI transported by Standard-HEMS had longer transport times' (p. 1), but one limitation of the study was that:

The patient cohorts received treatment by \(2\) different care teams at two hospitals, which is a potential confounder despite similar baseline characteristics

--- Gunnarsson et al. (2017), p. 5

In other words, the difference between hospitals and the staff may have been a confounding variable.

Observational studies can (and often do) have control groups. Indeed, one specific type of observational study is called a case-control study (Sect. 4.6.2). However, individuals are not allocated to the control group by the researchers in observational studies, so the control and study groups may be very different, which may explain any differences in the outcome.

Random sampling and random allocation are different concepts (Fig. 7.3), with different purposes, but are often confused:

  • Random sampling impacts external validity. Its purpose is finding individuals to study, and is possible in both observational and experimental studies.
  • Random allocation helps eliminate confounding issues, by distributing possible confounders across treatment groups, and is only possible in experimental studies. Random allocation impacts internal validity. Its purpose is allocating treatments to individuals, which does not occur in observational studies.
Comparing random allocation and random sampling.

FIGURE 7.3: Comparing random allocation and random sampling.

7.3 Hawthorne effect and blinding individuals

People, and perhaps animals, may behave differently if they know (or think) they are being watched, which could compromise the internal validity of the study. This is called the Hawthorne effect.

Definition 7.2 (Hawthorne effect) The Hawthorne effect is the tendency of individuals to change their behaviour if they know (or think) they are being observed.

Example 7.6 (Hawthorne effect: observational study) Wu et al. (2018) examined hand hygiene (HH) of staff in a tertiary teaching hospital, using covert observers (observers not obviously watching the HH practices) and overt observers (observers obviously about watching the HH practices). HH compliance was higher with overt observation (\(78\)%) than with covert observation (\(55\)%).

The impact of the Hawthorne effect can be minimized by blinding the individuals in the experiment, so that:

  • the individuals do not know that they are participating in a study; and/or
  • the individuals do not know the aims of the study; and/or
  • the individuals do not know which comparison group they are in.

In experimental studies, people often know they are in a study, due to ethics requirements (Sect. 5.2); they may not, however, know which treatment they have received. In observational studies, individuals may or may not know they are being observed. For instance, in an observational study where subjects' blood pressure is measured, subjects clearly know they are being observed, which has the potential to alter the subjects behaviour (for example, people become tense, called 'white-coat hypertension'; Pickering, Gerin, and Schwartz (2002)). As far as possible, efforts should be made to ensure that individuals do not know that they are being observed (the participants are blinded).

Example 7.7 (Hawthorne effect: experimental study) For the Himalaya study (Example 7.2), the article reports that (p. 1033):

The study was explained fully to the subjects, both verbally and in writing, and each gave their written, informed consent...

That is, the subjects knew they were in a study, and knew the aims of the study, so the Hawthorne effect may influence the results in this study. However, the subjects did not know which diet they were given.

Example 7.8 (Hawthorne effect: experimental study) People are more health-conscious if they know they will be examined regularly. For example, a study aiming to increase fruit and vegetable intake in young adults (Clark et al. 2019) noted that the observed increases in intake 'could be explained by the Hawthorne effect' as they 'know they are being observed...'. (p. 96).

Example 7.9 (Hawthorne effect: observational study) During the COVID-19 lockdowns in Denmark, Olesen and Feldthaus (2021) observed adults entering a large mall in Copenhagen 'while pretending to do something else' (p. 1). They noticed that (p. 1)

Almost all subjects [\(340/345\) (\(99\)%)] wore a personal protective face mask, but only \(141\) (\(41\)%) made use of the hand sanitizer.

Both masks and use of hand sanitizer were recommended by the Danish Health Authority, but the adherence to the two safety measures were markedly different. The authors surmised that (p. 1):

As no subjects were aware that they were being observed [...] wearing a face mask corresponded to being observed continuously by other customers and staff during a visit to the mall. In contrast, hand hygiene takes moments to perform, and no one can see whether or not it has been done. Thus, the Hawthorne effect may explain why almost all subjects wore a face mask, which is very visible, whereas only \(41\)% performed hand hygiene.

7.4 Observer effect and blinding researchers

Perhaps surprisingly, researchers' expectations or hopes may unconsciously influence how the researchers interact with the individuals and record observations. In addition, this may (unconsciously) influence the behaviour of the individuals in the study. This is called observer effect. (In experiments, it is sometimes called the experimenter effect.) This could compromise the internally validity of the study.

Definition 7.3 (Observer effect) The observer effect occurs when the researchers unconsciously change their behaviour to conform to expectations because they know what values of the explanatory variable apply to the individuals. This may then cause the individuals to change their behaviour or reporting also.

The impact of the observer effect can be minimized by blinding the researchers, so that they do not know which treatments the individuals are receiving. The researchers giving the treatment and the researchers evaluating the treatment can both be blinded, by using a third party. For example, the researchers may give an assistant two drugs, labelled A and B. The assistant administers the drug and evaluates the participants' response to the treatments. Later, the assistant tells the researchers whether Drug A or Drug B performed better, but only the researchers know which drugs the labels A and B refer to (Fig. 7.4).

Using a third party to avoid the observer effect.

FIGURE 7.4: Using a third party to avoid the observer effect.

Example 7.10 (Observer effect: experimental study) Seo et al. (2020) examined the impact of an injection to alleviate post-operative umbilical pain, and stated (p. 392):

...the postoperative pain scores were gathered by a nurse practitioner who was blinded to the usage of bupivacaine to avoid observer-expectancy bias [i.e., the observer effect].

The observer effect does not just apply to situations with people as individuals.

Example 7.11 (Observer effect) 'Clever Hans' was a horse that seemed to perform simple mental arithmetic. By using an experiment where the people interacting with the horse were blinded, Carl Stumpf realised that the horse was responding to involuntary (and unconscious) cues from the trainer.

The same effect has been observed in narcotic sniffer dogs (Bambauer 2012), who may respond to their handlers' unconscious cues.

The observer effect is when the researcher unconsciously influence the individuals, and are not aware it is occurring. Intentionally influencing the individuals is fraud.

The observer effect can impact observational as well as experimental studies. For example, consider a study measuring the blood pressure of smokers and non-smokers (Verdecchia et al. 1995). This study is observational (individuals cannot be allocated to be a smoker or non-smoker), but if the researchers know if an individual is a smoker when they measure blood pressure, then the observer effect could still impact the results (recalling that the observer effect is an unconscious effect). For example, the researchers may expect smokers to have a high blood pressure.

The observer effect could be managed by first measuring the blood pressure, and then asking if the individual was a smoker or not. That is, the researchers may be blinded to whether or not the subject is a smoker when they measure blood pressure. This may only be partially successful; the researcher may see the subject carrying cigarettes, or can smell smoke on their breath, for example. Nonetheless, since it may prove at least partially successful and is easy to implement, this strategy should form part of the research design.

Example 7.12 (Observer effect: observational study) Zimova et al. (2020) took photos of snowshoe hares, at various stages of moulting and in various environmental conditions. Eighteen independent observers were asked to rate the moult stage from the photographs (p. 4):

... images were randomly named and sorted, with the dates [...] removed to minimize observer expectancy bias [i.e., the observer effect].

Blinding the observer is not always possible, but should be used when possible to improve the internal validity of the study.

A study of the scats of gray wolves was used to study their diet (Spaulding, Krausman, and Ballard 2000). A scat analysis is where humans examine the scat of carnivores to determine the prey. However, the accuracy of the results was questioned, due to 'perpetuation of the assumption that wolf scats contain only \(1\) prey item/scat' (p. 949).

The observers might be seeing what they expect to see: that "wolf scats contain only \(1\) prey item/scat".

7.5 Placebo effect, controls, objective data, and blinding

Perhaps surprisingly, individuals in a study may report effects of a treatment, even if they have not received an active treatment. This could compromise the internally validity of the study. This is called the placebo effect, which generally only impacts people as individuals.

Definition 7.4 (Placebo effect) The placebo effect occurs when individuals report perceived or actual effects, despite not receiving an active treatment.

For example, people who attend therapy expect a positive outcome; this expectation may result in temporary or subjective (or sometimes even real) improvements in their condition. This is the placebo effect.

To manage the placebo effect, researchers should record objective data rather than patient-reported outcomes when possible (Enck et al. 2013). In addition, blinding the individuals and the researchers may help manage the placebo effect, as then the individuals cannot know which group they are in.

Example 7.13 (Placebo effect) Three active pain relievers were compared to different-coloured placebo (Huskisson 1974) in \(22\) patients. The most pain relief was experienced by those taking red placebos (Fig. 7.5), who experienced even more pain relief than those given true pain relievers. Note that the outcome is subjective: a patient-reported outcome.

Pain relief, for various pain relief medicine and placebos.

FIGURE 7.5: Pain relief, for various pain relief medicine and placebos.

Since the placebo effect is concerned with individuals response to allocated treatments, it is not directly relevant to observational studies.

Example 7.14 (Placebo effect) In the Himalaya study, the individuals 'were not told the identity of the test cereal in the foods provided' (Bird et al. (2008), p. 1033). The subjects were blinded to the diet they were exposed to. However, some may think they are on the refined cereal or Himalaya diet, and respond accordingly (perhaps unconsciously). The use of the refined cereal was acting as a control (Def. 2.15). Researchers measured faecal weight, an objective outcome, to minimise the placebo effect.

A study of placebos (Waber et al. 2008) gave half the subjects a placebo, but told them the pill was an expensive (implying 'effective') pain killer. The other half were also given a placebo, but were told the pill was a discount (implying 'less effective') pain killer. About \(85\)% of participants in the first group reported a pain reduction, yet only \(61\)% in the second group reported a pain reduction. Remember: both groups actually received a placebo! Again, 'pain relief' is subjective.

7.6 Carry-over effect and washouts

In the Himalaya study (Example 7.2), the diet is a between-individuals comparison: one group of patients was given the refined cereal diet (the control), and a different group of people was given Himalaya 292. The study also used a within-individuals comparison: each person in the study was actually placed on both diets at different times.

Suppose all patients spent four weeks on the Himalaya 292 diet, then the next four weeks on the refined cereal diet. Potentially, the first diet could still be impacting the subjects' faecal weight for a little while after stopping the first diet. This could compromise the internally validity of the study. This is an example of the carry-over effect: when the influence of one treatment or condition on the response variable carries over to influence the value of the response variable for next treatment or condition. The carry-over effect is only a concern for within-individuals comparisons.

Definition 7.5 (Carryover effect) The carry-over effect occurs when the influence of one treatment or condition on the response variable influences the response variable for subsequent treatments or conditions.

The impact of the carry-over effect may be minimized by using a washout or similar between treatments or conditions. For example, after tasting a food sample, participants may rinse their mouth with water before tasting another food sample. For the Himalaya study, the participants could spend two weeks on their usual (before-study) diet, before starting each of the diets in the study. This is called a washout period.

Example 7.15 (Carry-over effect: experimental study) In the Himalaya study, 'there was no washout period' (Bird et al. (2008), p. 1033) since the response variable was only recorded after individuals spent four weeks on each diet. Since faecal weight was not measured until the end of the four week periods, the carry-over effect is essential irrelevant.

In Jaskiewicz et al. (2020), student paramedics performed chest compression in real-life (RL), and also using virtual reality (VR). Researchers were assessing the relaxation percentage of the students while undertaking the compressios (a relaxation percentage of about \(50\)% is ideal).

When used by itself, the VR method produced an average relaxation percentage of \(45.5\)%. However, when the RL method was used first, and then followed by the VR method, the average VR relaxation method percentage was \(74.7\)%.

The response of the individuals was different depending on whether the RL method was used first. This is an example of the carry-over effect.

Sometimes, in experimental studies, researchers can randomly allocate the order in which the treatments are used (a cross-over study). That is, some participants start by spending four weeks on the Himalaya 292 diet, then four weeks on the refined cereal diet; meanwhile, other participants start by spending four weeks on the refined cereal diet, then four weeks on the Himalaya 292 diet.

Example 7.16 (Carry-over effect) In the Himalaya study (Example 7.2), the subjects were allocated randomly to whether they began the study on the Himalaya 292 diet or the refined cereal diet.

Example 7.17 (Washout periods: experimental study) R. D. MacDonald et al. (2006) required paramedics to conduct eight different tasks (such as electrical defibrillation and intravenous cannulation). Each of the \(16\) paramedics began the series of tasks at a random task, to mitigate the carry-over effect. A washout period between tasks (i.e., a rest time) was also used.

The carry-over effect is a possible compromise to internal validity in observational studies involving a within-individuals comparison. However, since treatments are not allocated in observational studies, carry-over effects may be difficult to prevent, as washouts cannot be imposed, and the order of the conditions cannot be imposed. However, observing individuals exposed to Condition A then Condition B, and other individuals exposed to Condition B then Condition A, may be possible.

Example 7.18 (Carry-over effects: observational study) Norris (2005) studied the carry-over effect in ecological observational studies of animals (p. 181):

...individuals occupying poor quality winter habitat may experience reduced reproductive success the following breeding season when compared to individuals occupying high quality winter habitat.

7.7 Describing blinding

Blinding occurs when those involved in the study do not know information about the study. Individuals in the study may be blinded (to help manage the Hawthorne effect) to

  • whether they are involved in a study;
  • the aims of the study in which they are participants; and/or
  • which comparison group they are in.

The researchers and the analysts can be blinded to which comparison groups apply to the individuals (to help manage the observer effect).

When blinding is used in as many ways as possible, the internal validity of the study is increased and bias reduced. However, when people are the individuals, ethics requirements may mean that they need to know they are in a study (especially if the study is experimental), and the purpose of the study.

If only the individuals are blinded to the comparison groups, the study is called single blind. If both the researchers and participants are blinded to the comparison groups, the study is called double blind. If the researchers, participants and the analyst are blinded to the comparison groups, the study is sometimes called triple blind. Rather than using these terms, explicitly stating who or what is blinded is clearer.

Blinding should be considered in all studies when possible (it is not always possible). Blinding participants does not just apply to people; it also may apply to animals (Example 7.11).

Example 7.19 (Double-blinding) Bulte et al. (2014) compared yields from modern and traditional cowpea crops in Tanzania. The two seed types ('traditional' and 'modern') were made similar in appearance so the farmers were blinded to which group they were in (control or treatment). The seed type would eventually become obvious as the crop grew, but 'key inputs were already provided' by then (p. 817).

In observational studies, blinding individuals may be (but is not always) easier than in experimental studies (Sect. 7.3). Blinding the researchers may be difficult, since the researchers need to record the value of the explanatory variable.

Example 7.20 (Blinding: observational studies) Emerson et al. (2010) studied Achilles tendinopathy in gymnasts, by comparing \(40\) elite gymnasts with \(41\) similar controls who were non-gymnasts. The authors state (p. 38):

Although the primary investigator was blind to the clinical status of the subjects, there was no blinding to whether each subject was in the gymnast or control group during image collection [...] However the examiner was blinded to both the clinical state and group of each subject when the images were reviewed.

The paper clearly explains who was blinded and to what parts of the study they were blinded.

7.8 Recording extraneous variables

One way to design a quality study is to record information about many (potential) extraneous variables. Various reasons for doing this have been given:

  • To evaluate external validity to determine if the sample is representative of the population (Sect. 6.6), by comparing the sample and population.

  • To improve internal validity, by helping to manage confounding:

    • by avoiding lurking variables (Sect. 3.4).
    • by determining if the comparison groups are similar (Sect. 7.2).
    • by using the information in analysis (Sect. 7.2).

Record the values of all extraneous variables that may be important in the study!

Example 7.21 (Poor internal validity) In the 1800s, Semmelweis recorded mortality rates of women after childbirth over many years (P. M. Dunn 2005) at two clinics:

  • In Clinic 1, with male doctors delivering babies: \(9.9\)%.
  • In Clinic 2, with female midwives delivering babies: \(3.4\)%.

Was the difference in mortality rate (the outcome) due to the sex of the person delivering the babies (the comparison)?

One possible confounder was the clinic; however, the clinic was eliminated as an explanation. For example, Clinic 2 was actually more overcrowded than Clinic 1, and the climate was similar for both clinics.

However, an important lurking variable was present. In the 1800s, the benefits of hand washing were not understood, nor commonplace. Many (male) doctors performed autopsies immediately before delivering babies, without washing their hands between procedures. In contrast, autopsies were not performed by the (female) nurses.

The lurking variable was 'whether the baby was delivered by someone with clean hands', which was related to the mortality rate and to the sex of the person delivering the baby. The female midwives had clean hands, and hence the mortality rate was (relatively) low. The male doctors did not have clean hands, and hence the mortality rate was high.

After instituting hand washing for doctors, the mortality rate in Clinic 1 reduced to a rate similar to that in Clinic 2.

7.9 Chapter summary

Designing effective studies (Fig. 7.6) requires researchers to manage or minimise confounding where possible, by restricting the study to certain groups; blocking individuals into similar groups; through special analysis methods; and/or through random allocation of the units of analysis. Random allocation is only possible for experimental studies.

Well-designed studies also try to manage the Hawthorne effect (e.g., by blinding participants); the observer effect (e.g., by blinding the researchers); the placebo effect (experimental studies only; e.g., by using controls, objective outcomes and blinding subjects); and the carry-over effect (e.g., by using a washout, or randomly allocating the treatment order). Using these measures, when possible, ensures that the results and conclusions from our studies are correctly interpreted.

The following short video may help explain some of these concepts:

Often, however, not all of these strategies can be used. For instance, people often know they are involved in an experimental study, so the Hawthorne effect may impact conclusions. In these cases, the possible impacts should be minimized as far as possible, and then the likely impact on the conclusions discussed. The impact of these issues are often reported as limitations in a journal article (Chap. 8).

Design considerations for designing studies. Note: lurking variables become confounding variables when recorded in the study, and so can be managed as a confounding variable. The arrows indicate the main design strategies to (perhaps partially) manage the indicated potential bias. Not all strategies are possible for every study.

FIGURE 7.6: Design considerations for designing studies. Note: lurking variables become confounding variables when recorded in the study, and so can be managed as a confounding variable. The arrows indicate the main design strategies to (perhaps partially) manage the indicated potential bias. Not all strategies are possible for every study.

Example 7.22 (Research design) Cross et al. (2019) (p. 3) compared chest compressions by student paramedics using dominant and non-dominant hands, and stated:

...participants were allocated randomly to one of two groups: 'dominant hand on chest' or 'non-dominant hand on chest'. Group allocation was determined by a computer-generated randomisation schedule...

The participants were blinded to the purpose of the study, but not to which group they were allocated. The analyst was also blinded to the group allocations. This study used many good design features.

7.10 Quick review questions

A study (Doosti-Irani et al. 2016) wanted to determine the relationship between the depth of bruising on apples and the size of the impact force. The researchers purposefully hit apples with three different forces (\(200\), \(700\) and \(1200\)) to inflict bruises on the apples. The researchers then recorded the depth of the bruising. The study was conducted separately for three different regions of the apple (lower; middle; upper), and each apple was only used once.

  1. What is the response variable?
  2. What is the explanatory variable?
  1. How would the variable 'location of the bruising' be classified?
  2. True or false: The researchers could minimise the effects of confounding by using potential confounding variables in the analysis.
  3. True or false: The researchers could use random allocation of the treatments to the apples to minimise confounding.
  4. True or false: The carry-over effect is likely to be a big problem in this study.
  5. True or false: The Hawthorne effect is likely to be a big problem in this study.
  6. True or false: The placebo effect is likely to be a big problem in this study.
  7. True or false: The observer effect is likely to be a big problem in this study.

7.11 Exercises

Answers to odd-numbered exercises are available in App. E.

Exercise 7.1 Are the following statements true or false?

  1. Experimental studies must use random samples.
  2. An experimental study must blind the researchers.
  3. Only observational studies can manage the observer effect
  4. Experimental studies must use a control group.
  5. Using random samples is important in observational studies as a way to manage confounding.

Exercise 7.2 Which of the following statements are true?

  1. Observational studies cannot have a control group.
  2. Only experimental studies can use random allocation to avoid confounding
  3. An experimental study must blind the participants.
  4. Only experimental studies can use random sampling
  5. In experimental studies, the treatments must be allocated by the researchers.

Exercise 7.3 Which of the following can be used to improve internal validity in experimental studies?

  • Blinding the individuals.
  • Using a control group.
  • Using special methods of analysis.
  • Randomly allocating treatments to groups.
  • Blinding the researchers.
  • Using random samples.

Exercise 7.4 Which of the following can be used to improve internal validity in observational studies?

  • Blinding the individuals.
  • Using a control group.
  • Using special methods of analysis.
  • Randomly allocating treatments to groups.
  • Blinding the researchers.
  • Using random samples.

Exercise 7.5 Is the Hawthorne effect only a (potential) issue for experiments. Explain.

Exercise 7.6 Lorenz et al. (2019) compared the efficacy of a new type of toothpaste. Participants were given either a new or an existing toothpaste formulation, and evaluations of plaque remaining on the teeth were taken. All participants knew they were being assessed after brushing.

Would the Hawthorne effect likely impact the internal validity of this study? Explain.

Exercise 7.7

A study compared the average amount of pollen returned to the hive per bee, for two types of native Australian bees: yellow and black carpenter bees, and green carpenter bees. In the study, the researchers also recorded the size of the hive, among other things. Why did they do this?

Exercise 7.8 In a study to treat septic shock, Hwang et al. (2020) used two study groups of size \(n = 58\) each: one group received the treatment of interest (intravenous infusion of vitamin C and thiamine) and the other group received intravenous saline.

Explain why the researchers gave saline to \(58\) subjects, when it has no chance of successfully treating septic shock. Is this ethical?

Exercise 7.9 Consider a study comparing the average weight loss for patients who are instructed to do about \(30\)of exercise a day (Group A), to patients who are instructed to do about \(60\)of exercise a day (Group B). Which of the following statements are true?

  1. This is an experimental study.

  2. The extraneous variable is the amount of exercise per day (in hours).

  3. The response variable is the weight loss for each person.

  4. The explanatory variable is whether the patient performs about \(30\) or \(60\) minutes of exercise per day.

  5. The response variable is the average weight loss.

  6. The explanatory variable is the amount of exercise the patient does per day (in hours).

  7. Age is likely to be a lurking variable.

  8. Age is an extraneous variable.

  9. Age is likely to be a confounding variable.

  10. Which (if any) of the following are possible confounding variables?

    • The sex of the patients.
    • The initial weight of the patients.
    • The names of the patients.

Exercise 7.10 Stafford, Daube, and Franklin (2010) studied smoking in alfresco restaurants in two cities in Western Australia. The concentration of particulate matter with a diameter smaller than or equal to \(2.5\) (per cubic metre of air) was recorded (PM\(2.5\)) from \(12\) cafes and \(16\) pubs. The researchers were interested in the relationship between PM\(2.5\) and the number of smokers. They also recorded the wind strength (calm; light breeze; windy) and the amount of cover (fully open; overhead cover only; overhead cover and enclosed sides).

  1. Is this an experimental or observational study?
  2. What are the response and explanatory variables?
  3. What are the extraneous variables, if any?
  4. Is blinding the individuals possible?
  5. Is random allocation possible?

Exercise 7.11 In a study of time spent applying sunscreen (Heerfordt et al. 2018), the Aim was to 'determine whether time spent on sunscreen application is related to the amount of sunscreen used' (p. 117). The study is described as follows (p. 118):

The volunteers were asked to apply the provided sunscreen [...] the way they would normally do on a sunny day at the beach in Denmark [...] The volunteers wore swimwear during the whole session. No other information was given. Participants applied sunscreen behind a curtain and were not observed during application. Measurements of time and sunscreen weight were made without the subjects' being aware of this.

  1. Is this an experimental or observational study?
  2. What are the response and explanatory variables?
  3. The researchers also recorded age, height, weight and body surface area of each participant. Why would they have done this?
  4. The researchers also compared the average values of the response variable for males and females, and the average values of the explanatory variable for males and females. Why would they have done this?
  5. What design features are evident in the quote?

Exercise 7.12 Paramedics were involved in a study to compare two treatments (Treatment A; Treatment B) for Post Traumatic Stress Disorder (PTSD), as randomly allocated to two groups of patients.

  1. Is this an experimental or observational study?
  2. What would be the control group?
  1. The patients did not know which treatment they received. What is this called?
  1. What is the purpose of blinding the participants?

Exercise 7.13 A scientist is testing whether tap water tastes the same as bottled water in a taste test (based on Teillet et al. (2010)). She provides people with a plastic cup of either bottled or tap water, and she asks them to give a rating of the taste on a scale of \(1\) (terrible) to \(5\) (fantastic). The RQ is:

For university students, is the taste of tap water better than the taste of bottled water?

This RQ needs some clarification, but you decide to answer this question using an experiment. Describe what these might look like for this study, and which are feasible: random allocation; blinding; double blinding; control; carry-over effect; finding a random sample.

What potential problems can you identify with the research design?

Exercise 7.14 Consider this RQ (based on Teillet et al. (2010)): 'Among university students, is the taste of tap water different than the taste of bottled water?'

You want to answer this question using an observational study. Describe what these might look like for this study, and which are feasible: random allocation; blinding; double blinding; control; carry-over effect; finding a random sample.

Exercise 7.15 A scientist compares the effects of two types of fertiliser on the yield of tomatoes (based on Klanian et al. (2018)). He plants tomato seedlings, and fertilises with Fertiliser I, and later records the yield of tomatoes. He then immediately plants more tomato seedlings in the same field, fertilises with Fertilizer II, and measures the yield of tomatoes.

What potential problems can you identify with the research design?

Exercise 7.16 Skulberg et al. (2004) compared two office-cleaning methods (p. 72):

The participants were randomly allocated to an intervention group or a control group using group level matching by sex, level of irritation symptom index, and allergy status [...] The participants and the field researchers were blinded to the group status of the participants [...] the cleaning was done in the evening after the employees had left the building.

The researchers then compared the change in nasal congestion for the two groups (intervention: 'a comprehensive cleaning of all surfaces'; control: 'only a superficial cleaning'), finding only small differences between the two groups. In the analysis, the researchers incorporated age and sex of the office workers.

  1. How did the researchers manage confounding?
  2. What other design features are evident from the quote?
  3. What is the response variable?
  4. What is the explanatory variable?
  5. What are the extraneous variables?

Exercise 7.17 Formwork is used in construction with reinforced concrete, and can be labour intensive. Mine et al. (2015) examined the relationship between the floor area of the building (in m2 per storey) and the number of hours of labour needed for constructing the formwork (in person-minutes per storey). The researchers also recorded the average age of the workers (in years); the average years of experience of the workers (in years); and the storey height (in meters) for each of \(n = 15\) multi-storey buildings in the study.

Two observers recorded the labour time by observing workers from the start to the end of the work.

  1. What is the explanatory variable?
  1. What is the response variable?
  1. What type of description is appropriate for the variable 'Average age of the workers'?
  1. What is the most likely way to manage confounding in this study?
  1. True or false: The carry-over effect is likely to be a big problem in this study.
  2. True or false: The Hawthorne effect is likely to be a big problem in this study.
  3. True or false: The placebo effect is likely to be a big problem in this study.
  4. True or false: Observer effect is likely to be a big problem in this study.