3 Lecture 1

3.1 Introduction

3.1.1 What is Statistics?

Statistics is a collection of procedures and principles for gathering data and analyzing information in order to help people make decisions when faced with uncertainty
- Utts & Heckard in ‘Statistical Ideas & Methods’

3.1.2 A Motivating Example

News Clip Manuscript

Pancreatic cancer has a very poor survival rate because it is often detected too late
A new app promises to detect early symptoms of jaundice that may go unnoticed typically
Should this “test” be adopted into routine practice?

3.1.3 What was the evidence behind this optimistic headline?

In an initial study the app detected cases of “concern” correctly 89.7% of the time, and classified “negative” cases correctly 96.8% of the time
The reference test was based on the total serum bilirubin level

3.1.4 What would a data detective ask?

Are the statistical methods appropriate?
Is the study design appropriate?
Is there information external to the study that affects its interpretation?

3.1.5 Results reported in the study

	Borderline or Elevated Bilirubin	Normal Bilirubin
BiliScreen Positive	35 (89.7%)	1
BiliScreen Negative	4	30 (96.8%)
Total	39	31

The statistics of interest when evaluating a diagnostic test are
- Sensitivity = Probability(Positive result | Reference test positive) = 89.7%
- Specificity = Probability(Negative result | Reference test negative) = 96.8%
Do these data provide good estimates of BiliScreen accuracy?

3.1.6 Evaluating the quality of the statistical methods

Is the study large enough?
What is the uncertainty around the reported results?
Were relevant statistics recorded?
Do the statistics provided help make a decision about the next step?

3.1.7 What if the sample size were smaller?

	Borderline or Elevated Bilirubin	Normal Bilirubin
BiliScreen Positive	9 (90%)	0
BiliScreen Negative	1	10 (100%)
Total	10	10

3.1.8 What if the sample size were larger?

	Borderline or Elevated Bilirubin	Normal Bilirubin
BiliScreen Positive	180 (90%)	0
BiliScreen Negative	20	150 (100%)
Total	200	150

3.1.9 Sample Size and Precision

3.1.10 Evaluating the quality of the statistical methods

Notice that the certainty we have in our conclusions depends on the sample size. The extreme results were less convincing when the sample size was reduced.
What sample size is needed to draw a definitive conclusion? That needs to be determined using appropriate statistical methods to obtain the desired precision. We will study this in Lectures 3 and 6

3.1.11 Evaluating the quality of the study design

Are the subjects in the study representative?
Is the reference standard relevant?
Are the subjects in the study representative?
- Healthy volunteers and patients from a medical centre were used
- If the test accuracy is systematically better or worse in these patients than in patients on whom the test will be used, then the results are biased
Is the reference standard relevant?
- Bilirubin level is a measure of jaundice, but not all cases of jaundice have pancreatic cancer
- If the accuracy of the test with respect to bilirubin level is systematically different from the accuracy with respect to pancreatic cancer, then our results may be biased

3.1.12 The role of external (or prior) information

Besides the sample size and study design, our conclusions may also be affected by information external to the observed results, for example from a previous study
Statistical analyses should take into account the impact of this prior information. We will study how to do so in Lecture 6

3.2 Reducing Bias in Research Studies

3.2.1 Bias vs. Precision

Precision results in a random departure from the true value
Bias is a systematic departure from the true value
A large sample size can improve precision but not bias. Study design and analysis could reduce bias

3.2.2 Common study designs used in clinical research

An analytical or experimental study can study the relation between an intervention and an outcome
A descriptive study, with no control group, cannot

3.2.3 Randomized Controlled Trial

Advantages:
- unbiased distribution of confounders;
- blinding more likely;
- randomisation facilitates statistical analysis.
Disadvantages
- expensive: time and money;
- study subjects not representative;
- ethically problematic at times.

3.2.4 Reducing bias in research studies

Different types of bias common in research studies have been enumerated

Type of bias	Description	Possible Remedial Measures
Selection bias	Sampling method results in sample not representative of the population	Random sampling Statistical modeling
Measurement bias	Measurement method records outcome with systematic error	Statistical modeling
Detection bias	Measurement method differs between groups being compared	Blinding
Confounding	Risk factors distributed unequally in groups being compared	Randomization Statistical modeling

Statistical methods are often used to reduce bias, either at the planning stage of a study or at the analysis stage
In this lecture, we will look at random sampling and randomization. In Lecture 12 we will look at adjustment via regression

3.2.5 A second motivating example: Renal Denervation

Image Source

A surgical procedure called “renal denervation” was developed to help people with hypertension who do not respond to medication.

3.2.6 Example 4a: Results from a cohort study of renal denervation\(^*\)

Baseline		3-month follow-up
Number of patients	Blood pressure	Number of Patients	Change in blood pressure
153	176/98 [systolic/diastolic (mmHg) Mean]	135	-25/-11 [systolic/diastolic (mmHg) Mean]

\(^*\)Investigators Symplicity HTN-1. Catheter-based renal sympathetic denervation for resistant hypertension: durability of blood pressure reduction out to 24 months. Hypertension 2011;57(5):911-917.

Can the large observed change be interpreted as being caused by renal denervation?
This is an example of a before-after design that reports on change over a period of time, typically the change after an intervention.
The primary drawback of this design is the lack of a control group.
The observed change may simply be attributable to the participation in the study (‘Hawthorne effect’). If so, then the same magnitude of change in the blood pressure would be observed in the control group. This would mean that the change was not due to renal denervation at all.
Therefore this study cannot provide proof that renal denervation causes a decline in blood pressure.
Another issue in the data presented here is that the variability around the mean change is not available. So we don’t know if all patients experienced this benefit.

3.2.7 Example 4b: Results compared to a control group\(^*\)

	Baseline		3-month follow-up
	Number of patients	Blood pressure	Number of Patients	Change in blood pressure
Renal Denervation	45	176/98 [systolic/diastolic (mmHg) Mean]	39	-21/-10 [systolic/diastolic (mmHg) Mean]
Control group*	5	173/98 [systolic/diastolic (mmHg) Mean]	3	+2/+3 [systolic/diastolic (mmHg) Mean]

\(^*\)Patients excluded from renal denervation arm for anatomical reasons

\(^*\)Catheter-based renal sympathetic denervation for resistant hypertension: a multicentre safety and proof-of-principle cohort study

The control group was of patients who were excluded for anatomical reasons.
It is possible that, the control group may not have had the same risk of resistant hypertension as the treatment group, i.e. the ‘anatomical reasons’ were a confounding factor. This may explain why the control group had a worse mean change in blood pressure than the renal denervation group
Therefore, once again, we don’t have a conclusive result.
Of course, the small size of the control group also does not help. Other concerns in this study include loss to follow-up. Only 18 patients completed the follow-up of 24 months.

3.2.8 Example 4c: Results from a randomized controlled trial (RCT) of renal denervation\(^*\)

	Baseline		6-month follow-up
	Number of patients	Blood pressure	Number of Patients	Change in blood pressure
Renal Denervation	49	178/96 [systolic/diastolic (mmHg) Mean]	49	-32/-12 [systolic/diastolic (mmHg) Mean]
Control*	51	178/97 [systolic/diastolic (mmHg) Mean]	51	+1/0 [systolic/diastolic (mmHg) Mean]

\(^*\)*Esler MD, Krum H, Sobotka PA, Schlaich MP, Schmieder RE, Bohm M. Renal sympathetic denervation in patients with treatment-resistant hypertension (The Symplicity HTN-2 Trial): a randomised controlled trial. Lancet 2010;376(9756):1903-1909

The study concluded there was a statistically significant (p<0.001) difference between the intervention and control groups
The randomization procedure gives us greater confidence in these results as patients had the same risk of a change in BP at the time of randomization
However, the study was not perfect. Importantly, it was not blinded and the main outcome was office BP rather than ambulatory BP. Therefore, it is possible that the patients in the renal denervation arm reacted differently owing to the greater attention they received.
Also, the follow-up of 6-months is very short and it is unknown whether the observed drop in BP is sustained in the long term.

3.2.9 Example 4d: Results from a second randomized controlled trial of renal denervation\(^*\)

“A significant change from baseline to 6 months in office systolic blood pressure was observed in both study groups.

The between-group difference (the primary efficacy end point) did not meet a test of superiority with a margin of 5 mm Hg.

The bars indicate standard deviations.”

The second RCT improved on the first one by using a sham procedure in the control group. This removed the concern about blinding.
They found that there was no significant difference between the renal denervation and control groups.

\(^*\)Bhatt et al. A controlled trial of renal denervation for resistant hypertension. N Engl J Med 2014;370:1393-401. DOI: 10.1056/NEJMoa1402670

3.2.10 Example 4: Renal Denervation as a treatment for resistant hypertension

An early study suggested that renal denervation (which uses radiotherapy to destroy some nerves in arteries feeding the kidney) reduces blood pressure. In that experiment, patients who received surgery had an average improvement in systolic blood pressure of 33 mmHg more than did control patients who received no surgery.
Later an experiment was conducted in which patients were randomly assigned to one of two groups. Patients in the treatment group received the renal denervation surgery. Patients in the control group received a sham operation in which a catheter was inserted, as in the real operation, but 20 minutes later the catheter was removed without radiotherapy being used. These patients had no way of knowing that their operation was a sham. The rates of improvement in the two groups of patients were nearly identical.(Samuels 10-11)

3.2.11 Lessons learnt from renal denervation example

A control group is necessary to draw conclusions about the effect of a variable
However, a randomized design is necessary to make a cause-effect conclusion
A randomized, controlled trial is not automatically unbiased. Blinding is necessary

3.2.12 Health Technology Assessment of Renal Denervation

The MUHC’s Technology Assessment Unit evaluated Renal Denervation in 2013. The full report is available here
We concluded:
“… There is evidence, based mainly on observational data that this procedure results in a clinically significant reduction in blood pressure at 6 months. Weaker evidence suggests that the effect is sustained up to 2 years of follow-up. Some side-effects, none unmanageable or permanent, are reported.

It is recommended that this technology receive temporary (two-year) and conditional approval for use only in the context of a formal research study to be supported by the manufacturer as specified.”

3.3 Random sampling and Randomization

3.3.1 Sample surveys

A sample survey is a type of observational study
In a sample survey a subgroup of a larger population is studied. Ideally, we wish to use methods to draw a representative sample to avoid bias
Surveys are preferred because they are less expensive and time consuming than a census (or complete enumeration of a population)

3.3.2 Simple random sample

A simple random sample is a sample of n items in which
- every member of the population has an equal chance of being included,
- members are chosen independently from each other
The word random does not mean haphazard. Rather, it refers to a well-defined process whose outcomes are not fixed but are determined by a probability distribution

3.3.3 Sample surveys Interestingly, if you use commonly accepted methods, a sample of size 1500 would be adequate to gauge the percentage of a population who have a certain trait or opinion to within ±3%

Further, this result does not depend on the size of the population. A sample size of 1500 is adequate whether the population size is 10 million or 4 billion, as long as a proper sampling technique has been used

3.3.4 Margin of error

An obvious question is: how close is a sample estimate to the true value?
The central limit theorem (which we will study in Lecture 3) we know that the margin of error around the sample mean is proportional to \(\frac{\sigma}{\sqrt n}\), where \(\sigma\) is the standard deviation and n is the sample size

3.3.5 How to choose a simple random sample

Create a sampling frame by listing all members of the population
Find a method to randomly select from among these
- e.g. a physical method, e.g. placing the names of members of the population in an opaque bowl and drawing the required number
- e.g. a virtual method with a computer, e.g. using the sample() function in R
The chosen members constitute the sample

3.3.6 Example: Drawing a random sample

A respiratory researcher wants to estimate the amount of inflammation in the parenchyma of a mouse lung.
She takes an image of a histological slide of the lungs of the mouse with staining of the inflammatory cells of interest.
She divides the images in a grid of 100 rectangular areas, but excludes 10 areas because they include airways.
She then counts the number of inflammatory cells in 40 areas randomly selected out of the remaining 90 areas
What was the sampling frame in this study, and how did it differ from the population of interest?
Explain why “using the wrong sampling frame” might lead to a biased estimate.
Use R to propose to the researcher which rectangular areas she needs to study.

3.3.7 Practical concerns when random sampling

For practical reasons, it may not be possible to obtain a simple random sample because it may not be possible to enumerate the entire population
- e.g. how would we enumerate the population of people who need to be screened by Biliscreen?
Then, it would be important to identify the population, and scrutinize the method of selection to ensure that the resulting sample satisfies the definition of a simple random sample
Other sampling techniques such as cluster sampling or stratified random sampling may be easier to implement

3.3.8 Some typical biases that can arise during a survey

Selection bias: Due to selecting non-representative sample
Non-response (or missingness bias): Arises when a representative sample was chosen but a subset could not or did not provide responses, e.g. a survey conducted during the evening would miss individuals who were working at that time
Response bias: Occurs when participants respond differently from how they feel, e.g. response to sensitive questions such as smoking habits

3.3.9 Randomization

Random sampling can also be used in the context of an experiment, such that each subject has the same probability of receiving the different treatments under study
Randomization ensures that any observed or unobserved confounding variables have a similar distribution in each treatment group

3.3.10 Simple randomization

Like with random sampling, there are different techniques we can use to carry out randomization to a treatment group
In simple randomization, subjects are assigned to groups based on a single sequence of random assignments
- e.g. If there are two treatments, we can toss a coin to determine how to assign each patient recruited into the study (Heads – Treatment, Tails – Control)
- Instead of a coin you can use a computer to generate the random sequence
This method is suitable when the planned sample size is relatively large and the subjects to be sampled are relatively homogenous

3.3.11 Relevance of statistical methods to researchers in the life sciences

Nandini Dendukuri, McGill University

Medical research is increasingly quantitative. Simultaneously, there is a move towards evidence-based medicine
Statistical methods are necessary for designing and analyzing research studies that can answer relevant questions
Knowledge of statistics is necessary for interpreting research publications

3.3.12 Organizations supporting transparent reporting of biomedical research & evidence-based decision making

3.3.13 Biomedical journals are insisting on appropriate statistical methods

3.3.14 FEV Example: Dataset

First few rows of FEV dataset
id	age	fev	ht	sex	smoke
1	9	1.708	57.0	0	0
2	8	1.724	67.5	0	0
3	7	1.720	54.5	0	0
4	9	1.558	53.0	1	0
5	9	1.895	57.0	1	0
6	8	2.336	61.0	0	0

The variables in the dataset include the following:
- fev (in liters)
- age (in years)
- height (in inches)
- gender (M/F)
- smoke (Y/N)