Chapter 4 Randomized controlled trials

4.1 Study design

A randomized controlled trial (RCT) is an experiment performed on human subjects (usually patients) in order to assess the efficacy of a treatment (or intervention) for some condition. It has two key features. First, the new treatment is given to a group of patients, the treatment group, and another treatment (standard of care or placebo treatment) is given to another group of patients, the control group, at the same time. Second, patients are allocated to one group or another by randomization. A more accurate name for RCT is randomized concurrently controlled clinical trial, because patients in the intervention and in the control group receive treatment at the same time.

There are often more than two treatments being compared in a single trial, such as a treatment in different doses. However, the fundamental principles of RCTs become clear already in the case of two groups. Moreover, the treatment is not necessarily a drug, but may also be a new surgical procedure, a certain therapy (e.g. acupuncture, psychotherapy), etc. Participants of an RCT may not get the “optimal” treatment, so there are important ethical aspects when conducting an RCT.

4.1.1 Historical context

Ronald Aylmer Fisher (1890-1962) and Austin Bradford Hill (1897–1991) were pioneers of randomization. In the 1920s, R.A. Fisher presented randomization as a crucial element of the design and analysis of experiment (R. A. Fisher 1925). He also said about the importance of design:

“To consult the statistician after an experiment is finished is often merely to ask him to conduct a post mortem examination. He can perhaps say what the experiment died of”

(Presidential Address to the First Indian Statistical Congress, 1938).

A.B. Hill also advocated for randomization (Hill 1937) and was the trial statistician in one of the very first RCTs among humans: the MRC Strepromycin trial. In 1835, more than 100 years before the MRC Streptomycin Trial, the `Nuremberg salt test’ was performed, using randomization and double blinding to evaluate the efficacy of homeopathy. Stolberg (2006) summarized and commented on the study.

The MRC Streptomycin Trial

The Streptomycin trial was conducted by the Medical Research Council (MRC) in the U.K. in 1946-1947 and the results were published in 1948 (Streptomycin in Tuberculosis Trials Committee 1948). Patients with pulmonary tuberculosis were randomly assigned to receive Streptomycin (a bactericidal antibiotic drug) and bed-rest, or bed-rest alone (the standard of care). The randomization series were not known to the investigator and the patient did not know whether they were in the treatment or control group. Results showed efficacy of Streptomycin against tuberculosis, albeit with minor toxicity and acquired bacterial resistance to the drug. Today, Streptomycin, in combination with penicillin, is used in a standard antibiotic cocktail to prevent bacterial infection in cell culture.

This trial is considered the first RCT, double-blind and placebo-controlled. However, this trial is not the first study using randomization and double blinding.

The Salk Polio Vaccine Trial

Polio was a major public health problem worldwide in the late 19th and early 20th centuries. Over 500,000 cases of polio were reported every year by the mid-20th century (source: (https://www.who.int/news-room/spotlight/history-of-vaccination/history-of-polio-vaccination)).

By 1953, several potential vaccines had been developed. One of the more promising was an inactivated virus vaccine developed by Jonas Salk at the University of Pittsburgh. Approximately 750’000 children were selected to participate in the study, about 340’000 declined to participate and another 8500 dropped out in the middle of the experiment (Francis Jr et al. 1955). The remaining children were randomized into a treatment and a control group with approximately 200’000 children in each group, as summarized in Table 4.1.

Table 4.1: Vaccine effectiveness in the Salk Polio Vaccine Trial.
Vaccine
Polio
Yes No Infection risk
Yes 82 200663 0.041%
No 162 201067 0.081%

The difference in the number of polio cases between groups was significant and could rightfully be interpreted as a clear indication that the vaccine was indeed effective.

4.1.2 Ethical issues of RCTs

There are important ethical aspects to consider when planning and conducting an RCTs. First, a patient must never be given a treatment that is known to be inferior. Moreover, patients must be fully informed about possible adverse reactions, side-effects etc. They must give their consent, preferably in writing. Additionally, patients who have entered a trial may withdraw at any time and they must receive the most appropriate treatment available outside the trial.

For these reasons, large trials usually have a data and safety monitoring committee, which periodically reviews the evidence currently available from the trial and gives recommendations whether the trial should continue or terminate.

4.1.3 Structure of RCTs

RCTs have five key characteristics:

  1. A population of eligible patients
  2. A group of patients recruited from this population
  3. Existence of (at least) two treatment groups
  4. Allocation to treatment by randomization
  5. Outcome measures in the treatment groups are compared at the end of the trial.

Example 4.1 Polack et al. (2020) conducted a multinational, placebo-controlled, observer-blinded, pivotal efficacy trial to assess the efficacy of the BNT162b2 vaccine candidate against laboratory-confirmed Covid-19.

4.1.4 Eligibility criteria

A strict set of eligibility criteria allows to generalize the results of the trial to the population of eligible patients. We can distinguish inclusion and exclusion criteria.

Specification of eligibility criteria is particularly important for trials with long enrollment period (months, years) and multicenter trials. Otherwise the type of patient recruited may change over the duration of the trial or may vary from center to center.

Example 4.1 (continued) The inclusion criteria in Polack et al. (2020) were: adults 16 years of age or older, healthy or with a stable chronic medical condition. Some of the key exclusion criteria included: a medical history of Covid-19, treatment with immunosupressive therapy, or diagnosis with an immunocompromising condition. Out of the 44820 participants which were screened for the study, 1150 were excluded because they did not meet the eligibility criteria.

4.1.5 The need for a control group

At first sight it seems odd to include a control group (larger sample size, additional costs, possibly inferior treatment given to control group). But there is always patient variability in response to treatment, so without a control group we do not know to what extent patient response is due to treatment. For example, without a control group, regression to the mean may indicate an apparent, non-existing treatment effect.

Regression to the mean

Example 4.2 Regression to the mean in the absence of a control group is illustrated with an example from Stephen Senn (2011). Imagine a single-arm trial where all patients receive treatment. Suppose that the mean diastolic blood pressure (DBP) is the same at baseline (before treatment is given) and at follow-up (“at outcome”). So, there is no treatment effect, as illustrated in Figure 4.1 (left plot). The inclusion criteria of a hypothetical trial are hypertension at baseline, which is defined as DBP \(>\) 95 mmHg. The resulting distribution is shown in Figure 4.1 (right plot). In this hypothetical trial, mean DBP at follow-up is 100.6 mmHg, more than 2 mmHg lower than at baseline (98.0 mmHg). This suggests a treatment effect which is not present. Now suppose that this hypothetical trial was in fact the Placebo arm in a two-arm trial. An apparent Placebo effect in the Placebo arm of this trial will then be just an artefact due to regression to the mean (Dirnagl, Ulrich 1996)!

Regression to the mean in an example from @senn2011. The left plot shows the joint distribution of diastolic blood pressure measurements at Baseline and at Outcome. The mean of both measurements is 90 mmHg. The right plot shows the truncated distribution of the patients that have been included in the hypothetical trial. The mean DBP at outcome is now more than 2 mmHg lower than at baseline.Regression to the mean in an example from @senn2011. The left plot shows the joint distribution of diastolic blood pressure measurements at Baseline and at Outcome. The mean of both measurements is 90 mmHg. The right plot shows the truncated distribution of the patients that have been included in the hypothetical trial. The mean DBP at outcome is now more than 2 mmHg lower than at baseline.

Figure 4.1: Regression to the mean in an example from Stephen Senn (2011). The left plot shows the joint distribution of diastolic blood pressure measurements at Baseline and at Outcome. The mean of both measurements is 90 mmHg. The right plot shows the truncated distribution of the patients that have been included in the hypothetical trial. The mean DBP at outcome is now more than 2 mmHg lower than at baseline.

Historical controls

Historical controls are an alternative to concurrent controls in situations where randomizing patients to a control treatment or placebo is not feasible, for practical or ethical reasons.

The main issue with historical controls is that there may be systematic differences between patients in the historical control group and patients in the treatment group, which might cause biased results. There might, for example, be some heterogeneity in the selection of patients and possible time trends such as a change in disease severity over time (e.g. different variants of COVID-19). In addition, there might be differences in the conduct and analysis of the historical and the new trial (Marion and Althouse 2023).

Example 4.3 Coelho et al. (2023) conducted an open-label, single-group, phase 3 trial to evaluate the use of transthyretin gene silencing as a treatment for hereditary transthyretin (ATTRv) amyloidosis. ATTRv amyloidosis is a life threatening genetic disorder for which several medications have been approved by the FDA. In order to not “unnecessarily expose participants to sequelae from a rapidly progressive and potentially fatal neurologic disease”, a historical placebo from a previous trial conducted between 2013 and 2017 was chosen.

4.1.6 The importance of randomization

Allocation at random aims to ensure that all prognostic factors (both known and unknown) are similarly represented in the two groups. That means that there is no systematic difference between the groups which may have caused treatment difference.

Example 4.4 Table 4.2 summarizes the results of an observational study comparing two different treatments for renal calculi (Charig et al. 1986). The overall odds ratio (comparing treatment A to treatment B) is

\[\mbox{OR} = \frac{273 \cdot 61}{77 \cdot 289} = 0.75.\]

Therapy B is hence apparently more successful than Therapy A.

Table 4.2: Results from an observational study about two different treatments for renal calculi (Charig et al. 1986), overall and in two sub-groups.
Success
Yes No Total
Therapy A 273 (78%) 77 350
Therapy B 289 (83%) 61 350

However, this study is non-randomized, and the distribution of the two therapies differs among patients with small resp. large kidney stones, see Table 4.3.

Table 4.3: Success of therapies for small and large kidney stones
Small kidney stones
Large kidney stones
Success
Success
yes no total yes no total
Therapy A 81 (93%) 6 87 192 (73%) 71 263
Therapy B 234 (87%) 36 270 55 (69%) 25 80

The odds ratio in the subgroups of patients with small kidney stones is

\[ \mbox{OR} = \frac{81 \cdot 36}{6 \cdot 234} = {2.10} \]

and the odd ratio in the subgroup of patient with large kidney stones is \[\mbox{OR} = \frac{192 \cdot 25}{71 \cdot 55} = {1.23}.\]

This sub-group analysis hence reveals that Therapy A is more successful than Therapy B in both sub-groups. This is an example of Simpson’s paradox. The success rate was on average higher for small kidney stones (seem to be easier to cure) and because patients receiving therapy B included a much larger proportion of patients with small kidney stones, this therapy seemed better when the size of kidney stones was ignored. If the size of kidney stones is adjusted for, we can see that A is better for both sizes, but because A was given mostly to patients with large kidney stones, with lower average success rate, we could not see this before.

4.2 Bias in RCTs

We will now discuss in detail the five types of biases that may occur in RCTs as described in J. N. S. Matthews (2006). There are many more types of biases, see for example .

4.2.1 Selection bias

Selection bias occurs if the study sample is systematically different from the population it was intended to represent.

Two types of sampling methods can be distinguished: random sampling (also called “probability sampling”), where population members have an equal probability of being selected into the study, and non-random sampling (“non-probability sampling”). An example of non-random sampling is convenience sampling, where patients are selected if it is convenient and they are easily accessible. This type of sampling is prone to selection bias.

4.2.2 Allocation bias

Randomization ensures - in principle - that the groups to be compared are balanced with respect to all relevant prognostic factors. Allocation bias occurs if there is a systematic difference between participants in how they were allocated to treatment groups. We do not expect this to happen if randomization is properly done. However, it is sometimes forgotten that proper randomization rests on adequate allocation concealment. Allocation concealment means that the allocation sequence is not known to those selecting the participants and assigning them to the intervention groups until the moment of assignment. If the allocation sequence is not concealed, it may be possible to subvert the recruitment of participants into the study and their allocation to treatment.

Trials with small sample size may also result in imbalance with respect to prognostic factors, simply due to the stochastic process of allocation. There are methods to address the latter problem in the design phase, e.g. stratification.

4.2.3 Assessment bias

At the end of the trial, and during its course, observations are made on a variety of outcome variables. Some of these variables may have a subjective component, depending on the observer (e.g. blood pressure). If the observer knows the treatment being given to the patient, then the value of an observation might be influenced by the knowledge of the treatment. This will lead to assessment bias. This problem can be addressed by blinded outcome assessment.

4.2.4 Stopping bias

It is tempting to assess the outcomes of a trial repeatedly and to stop the trial once a promising result has been obtained. This can be done as part of a prespecified method for assessing the data, so-called group sequential methods (see Section 11.1.2). However, naive repeated checking of “significance” will introduce stopping bias. Stopping bias is illustrated in Example 4.5.

Example 4.5 Data with 250 patients per group are generated without any difference in the underlying means. A \(t\)-test is performed every 10 patients (5 per group). The particular trajectory shown in Figure 4.2 flags a significant difference early in the trial on the basis of 60-90 patients at the standard 5% significance level (1.96 threshold for the absolute value of test statistic). However, at the end of the trial no significant difference can be observed. An extensive simulation reveals that such a significant difference will occur at some point in time in 36% of the cases. If we would only assess significance at the end of the trial based on 500 patients, then this proportion would be equal to the nominal significance level of 5%.

Stopping bias illustrated in accumulated data generated without any treatment effect.

Figure 4.2: Stopping bias illustrated in accumulated data generated without any treatment effect.

4.2.5 Publication bias

Successful publication of the results of a clinical trial will depend on the referees and editors of suitable journals. Papers reporting positive (typically “significant”) findings are more likely to be published than those that do not. This will lead to publication bias. Also, papers are more likely to be written and submitted by the investigators if the outcome of the trial was positive.

Some attempts have been made to address this problem, e.g. mandatory registration of clinical trials before the study is conducted: (http://clinicaltrials.gov/). Today, trial results are required to be published on this website within one year after the trial has finished.

4.3 Target trials

Randomized controlled trials are the gold standard for assessing the effectiveness and safety of new drugs or interventions. However, they might be restricted to a specific study population, and are not always feasible due to ethical or financial considerations. In that case, observational data can be used to emulate a target trial. Details on the challenges and conduct of target trials are explained in Hernán, Wang, and Leaf (2022) and A. A. Matthews et al. (2022), for example.

4.4 Additional references

An introduction to randomized controlled clinical trials is given in J. N. S. Matthews (2006). The design of experiments is discussed in . Regression to the mean is discussed in the Statistics Notes J. Martin Bland and Altman (1994a), J. Martin Bland and Altman (1994b). A practical example for allocation concealment is Adamsen et al. (2009), and Hijazi, Taylor, and Richardson (2009) for convenience sampling.

References

Adamsen, Lis, Morten Quist, Christina Andersen, Tom Møller, Jørn Herrstedt, Dorte Kronborg, Marie T Baadsgaard, et al. 2009. Effect of a multimodal high intensity exercise intervention in cancer patients undergoing chemotherapy: randomised controlled trial.” BMJ 339: 895–99.
———. 1994a. Statistics Notes: Regression towards the mean.” BMJ 308: 1499.
———. 1994b. Statistics Notes: Some examples of regression towards the mean.” BMJ 309: 780.
Charig, C R, D R Webb, S R Payne, and J E Wickham. 1986. Comparison of treatment of renal calculi by open surgery, percutaneous nephrolithotomy, and extracorporeal shockwave lithotripsy.” BMJ 292: 879–82.
Coelho, Teresa, Wilson Marques, Noel R. Dasgupta, Chi-Chao Chao, Yeşim Parman, Marcondes Cavalcante França, Yuh-Cherng Guo, et al. 2023. “Eplontersen for Hereditary Transthyretin Amyloidosis with Polyneuropathy.” JAMA 330 (15): 1448. https://doi.org/10.1001/jama.2023.18688.
Dirnagl, Ulrich. 1996. “Believe It or Not!” https://dirnagl.com/2018/02/09/believe-it-or-not/.
Fisher, Ronald Aylmer. 1925. Statistical Methods for Research Workers. Edinburgh: Oliver; Boyd.
Francis Jr, Thomas, Robert F Korns, Robert B Voight, Morton Boisen, Fay M Hemphill, John A Napier, and Eva Tolchinsky. 1955. “An Evaluation of the 1954 Poliomyelitis Vaccine Trials.” American Journal of Public Health and the Nation’s Health 45 (5 Pt 2): 1–63.
Hernán, Miguel A., Wei Wang, and David E. Leaf. 2022. “Target Trial Emulation: A Framework for Causal Inference from Observational Data.” JAMA 328 (24): 2446. https://doi.org/10.1001/jama.2022.21383.
Hijazi, Ramzi, David Taylor, and Joanna Richardson. 2009. Effect of topical alkane vapocoolant spray on pain with intravenous cannulation in patients in emergency departments: randomised double blind placebo controlled trial.” BMJ 338: 1–5.
Hill, Austin Bradford. 1937. Principles of Medical Statistics. London: The Lancet.
Marion, Joe D., and Andrew D. Althouse. 2023. “The Use of Historical Controls in Clinical Trials.” JAMA 330 (15): 1484. https://doi.org/10.1001/jama.2023.16182.
Matthews, Anthony A, Goodarz Danaei, Nazrul Islam, and Tobias Kurth. 2022. “Target Trial Emulation: Applying Principles of Randomised Trials to Observational Studies.” BMJ, August, e071108. https://doi.org/10.1136/bmj-2022-071108.
Matthews, John N. S. 2006. Introduction to Randomized Controlled Clinical Trials. Second. Chapman & Hall/CRC.
Polack, Fernando P., Stephen J. Thomas, Nicholas Kitchin, Judith Absalon, Alejandra Gurtman, Stephen Lockhart, John L. Perez, et al. 2020. “Safety and Efficacy of the BNT162b2 mRNA Covid-19 Vaccine.” New England Journal of Medicine 383 (27): 2603–15. https://doi.org/10.1056/nejmoa2034577.
Senn, Stephen. 2011. Francis Galton and regression to the mean.” Significance 8 (3): 124–26.
Stolberg, M. 2006. “Inventing the Randomized Double-Blind Trial: The Nuremberg Salt Test Of1835.” Journal of the Royal Society of Medicine 99 (12): 642–43. https://doi.org/10.1258/jrsm.99.12.642.
Streptomycin in Tuberculosis Trials Committee. 1948. “Stretomycin Treatment of Pulmonary Tuberculosis.” British Medical Journal 2 (October): 769–82.