Chapter 3 Probability & Statistical Inference

Under development - current text copied from another book in development for DPT students called “Clinical Inquiry”

3.1 Probability introduction

3.1.1 State space (sample space)

An event space is often denoted by sigma (\(\Sigma\)) and contains all sets of outcomes.

A sample space of an experiment is denoted by omega (\(\Omega\)) and contains all possible outcomes.

We will usually be dealing with \(\Omega\), but for something like a 6 sided dice we can use \(\Sigma\).

If \(E\) defines an event, (say rolling a 6 on a dice), then {1, 2, 3, 4, 5, 6} is the event space (all sets of outcomes)

\(P(E) = \frac{E}{\Sigma}\)

\(P(rolling \space a\space 6) = \frac{rolling \space a\space 6}{6} = \frac{1}{6}\)

You expect a 6 about \(\frac{1}{6}\). What is an unexpected outcome?

3.1.2 Probability is bound between 0 and 1

\(0 \le P(E) \le 1\)

3.1.3 Certainty and uncertainty

Certainty includes \(P(E) = 0\) and \(P(E) = 1\)

Therefore the probability of an uncertain event is \(0 < P(E) < 1\) (particular)

Therefore the probability of all uncertain events are \(0 < P(E) < 1\) (universal)

3.1.4 Connection to deductive logic

If \(E\) is \(TRUE\), then \(P(E) = 1\)

If \(E\) is \(FALSE\), then \(P(E) = 0\)

3.1.5 Using Sets as “Probability Spaces”


  • \(E \subseteq F\) is that E is a “subset” of F
  • \(E \cap F\) is set intersection and is similar to logical “and” (\(\land\))
  • \(E \cup F\) is set union and is similar to logical “or” (\(\lor\))

If \(E \subseteq F\), then \(P(E) \le P(F)\)

If \(E \cap F\), then \(P(E \lor F) = P(E)+P(F)\)

3.1.6 Conditional probability

\(P(E \mid F)\) is the \(P(E)\) “given that” \(P(F) = 1\) or you could say \(F\) is \(TRUE\)


If \(E \subseteq F\), then \(P(F \mid E) = 1\)

If \(E \cap F = 0\), then \(P(E \mid F) = 0\)

3.1.7 Causation connections

If \(E\) causes \(F\) we can use the notation \(E \rightarrow F\);

Then it is true that \(P(F \mid E) \ge P(F \mid \neg E)\) (note that \(\neg\) means “not”)

This is the notation for Hill’s criteria, “strength of association” and is consistent with the statement: “causation implies correlation,” but notice that it is inconsistent with the statement “correlation implies causation” (as you have learned, correlation does not imply causation).

3.2 Notes and thoughts on Chapter 3: Probability in the clinical encounter

Notes and thoughts on Chapter 3 from the book: (Anjum, R. L., Copeland, S., & Rocca, E. (2020). Rethinking causality, complexity and evidence for the unique patient: a CauseHealth Resource for healthcare professionals and the clinical encounter (p. 241). Springer Nature.)[]

3.2.1 3.1 Uncertainty and Probability in the Single Case

A goal of this book - and indeed of all clinical inquiry - is to consider the probability that intervention \(X\) will work this time for this patient

Note that the probability that \(X\) “will work” this time for this patient is a conditional probability (let’s take \(P(X)\) to be the probability that intervention \(X\) works).

Therefore, \(P(X \mid p_i)\), is the probability that \(X\) will occur (and \(X\) is that intervention \(X\) will work) given \(p_i\) (note that I’m using \(p_i\) as notation for “this time for this patient”).

For one moment - consider what it means to say an intervention works? It means that an outcome has been achieved. If you have a headache, and you take an Tylenol capsule, you consider that it works if your headache goes away. That is an outcome indicating the intervention worked.

We can also break \(X\) into a conditional probability, since \(P(X)\) is the probability that intervention \(I\) is effective; and since \(I\) being effective depends on \(O\), the outcome:

\(P(X) = P(O \mid I)\) (the probability of \(X\) being effective is the probability of \(O\) given \(I\))

Notice that this conditional probability \(P(O \mid I)\) is equivalent to the causal claim: \(I \rightarrow O\) Where does this probability come from? What does it represent? What do we mean by it?

3.2.2 3.2 Probability from Statistics: Frequentism

Frequentism is the probability that you’re familiar with from statistics. It is considered “objective” in that it is based on measurements of observations. These probabilities are often estimated from samples in an attempt to understand something about the population. This is the process of statistical inference we’ll talk about soon, and is “induction” as we’ve already discussed.

The assumption is that as the sample size (\(n\)) gets larger we are more confident in the estimate:

As \(n_{sample}\) approaches \(n_{population}\) we are more confident in the estimate of the population (and thus maybe “universal”) probability.

We’ll talk a lot more next week about confidence in these estimates when we discuss statistical inference using “confidence intervals.”

Note that estimates are both variable and uncertain. Variability is part of their behavior. In fact, variability says something about the events. 3.2.1 Evidence based approaches:

Usually rely on estimates described from samples. The issue, and challenge, is applying these estimates to apatient that was not part of that sample. 3.2.2 Randomization, Inclusion and Exclusion Criteria in population (sample) trials:

Used to reduce bias.

Bias associated with unknown confounders (randomization)

Bias associated with known confounders (inclusion and exclusion criteria) Internal and external validity of causal claims from RCTs:

Causal claims from RCTs are statements of whether some intervention is effective (\(X\)).

Internal Validity

If \(X\) is determined to be \(TRUE\) for a sample, we can consider \(X \subseteq S\) which means \(X\) as a characteristic is one of the characteristics of the sample \(S\).

Threats to internal validity impact whether the study actually demonstrated what they claim to have demonstrated.

External Validity

Then we need to consider whether the sample is a subset of the population (whether the characteristics of the sample are true for the population: \(S \subseteq P\)

Threats to external validity impact whether the findings of the study in this sample, apply to the population (related to statistical inference and induction)

Overall: \((X \subseteq S) \land (S \subseteq P) \Rightarrow (X \subseteq P)\)

Which means that if X is true for the sample, and the sample reflects the population, then X is true for the population (transitivity).

The mission of this book, and the project from which is comes (CauseHealth), and my work for the past 17 years or so, has been dealing with the fact that the patient you’re working with (\(p_i\)) is not in the sample, and not necessarily in the population.

Just because: \((X \subseteq S) \land (S \subseteq P) \Rightarrow (X \subseteq P)\) is true (and that is usually uncertain)

It does not mean that:

\((X \subseteq P) \land (p_i \subseteq P) \Rightarrow (X \subseteq p_i)\)

In fact:

\((X \subseteq P) \land (p_i \subseteq P) \nRightarrow (X \subseteq p_i)\)

3.2.3 3.3 Probability as Degree of Belief - Subjective Credence

Three things: First, don’t conflate conditional probabilities with “subjective” Second, don’t conflate Bayesian probabilities or inference with “subjective” Third, don’t make as big a deal of “objective” and “subjective” probabilities as they do in this chapter.

As you will come to appreciate, all objective probabilities are obtained with some subjectivity; and all subjective probabilities are founded in some sort of objectivity.

You’ll also come to appreciate that all causal claims can be considered as conditional probability; and since all clinical practice is based on causal claims, then all clinical practice is based on conditional probabilities. Every question you ask, every action you take is because whatever you know “right now” is adjusted based on “what you have just learned.” Meaning, you’re always adjusting your belief / understanding / knowledge based on new information - - which is the same thing as saying you’re always using conditional probabilities. 3.3.1 Updating belief - as stated above, absolutely unavoidable if you want to be a good clinician 3.3.2 Understanding the basic Bayesian formula - to be covered in Chapter 7 3.3.3 Uncertainty as a lack of knowledge

Here I believe they are overstating the difference between ontology and epistemology.

There is variability in an event \(E\), we try to figure out why, but whether it is ontological or epistemological may not be reconciled; this does not change the value of knowing what we can find out.

3.2.4 3.4 Probability as Dispositional and Intrinsic Properties

Provides some language - I’m honestly not yet sure how it benefits the overall goal.

For example, I think we’ve always thought of probabilities are reflection the “propensity” of an event. Some of that is “intrinsic” to the characteristics of the event itself, some of it is due to variability of other factors.

Once again we see that the purpose of the book includes a conditional probability: “The individual propensities of a single patient, treatment or context are given by it’s unique combination of dispositions and the dispositions’ degree of tendency (propensity) toward particular outcomes in that individual case.”

\(P((O \mid I) \mid p_i)\)

The probability of an outcome given the intervention, given this time for this patient.

3.2.5 3.5 Propensities and the Clinic

Useful with consideration to the notes above. How usual is yet to be determined (at least in my opinion).

3.3 Statistical Inference (Induction)

This chapter describes the use Confidence Intervals as a tool for making statistical inferences. Statistical inferences use probability to consider how effective inductive inferences from a sample are at providing an an estimate of a characteristic (often referred to as a parameter) in a population. Samples are collections of individual (particular) observations and a population represents a universal (or general) characteristic. Estimates from a sample are called statistics, whereas estimates from a population are parameters.

Two common parameters that we attempt to estimate are proportions (percentage of some characteristic) and means (mathematical average). The actual numerical value of a population proportion or mean would rarely be known.

In order to estimate the population proportion or mean we calculate a sample proportion or mean from a sample. The value of the sample mean is then used to estimate an unknown population mean.

Since we know that there is error in our estimate, the question becomes how much error? The amount of error is related to the size of the sample.

3.4 Confidence Intervals (CI)

A confidence interval provides an interval around a sample estimate that contains the population parameter with a given probability. For example, a 95% CI is an interval that is expected to contain the true population parameter 95% of the time.

If the mean height of a sample of students is 69 inches, and the 95% CI = [66, 72], then the population parameter (mean height) has a probability of 0.95 to between 66 and 72 inches.

Another interpretation is that if we were to randomly sample 100 samples of the same size from the same population, 95% of the sample means would fall within the ranage of the 95% CI = [66, 72].

3.4.1 General equations

\(sample \space estimate = population \space parameter + random \space error\)

\(95\%CI=sample \space estimate \pm z \times \space standard \space error\)

Where \(z\) is a multiplier that comes from the normal curve. For a 95% CI \(z\) is often taken to be 1.96 (or 2 more conservatively). Another option is to use the \(t\) distribution (flatter and wider than the normal curve) which is recommended for samples of less than 30.

3.4.2 CI for a Proportion

\(\sqrt{\frac{}{}}\) \(95\%CI=sample \space proportion \pm z \times \sqrt{\frac{sample \space proportion(1-sample \space proportion)}{n}}\)

Note the Standard Error of a sample proportion is:

\(Standard \space Error \space of \space sample \space proportion=\sqrt{\frac{sample \space proportion(1-sample \space proportion)}{n}}\)

3.4.3 CI for a Mean

\(95\%CI=sample \space mean \pm z \times \frac{standard \space deviation}{\sqrt{n}}\)

Note the Standard Error of the Mean is:

\(Standard \space Error \space of \space the \space Mean=\frac{standard \space deviation}{\sqrt{n}}\)

3.4.4 CI to evaluate differences

Standard Error (SE) of a Difference:

\(SE \space for \space a \space Difference=\sqrt{(SE_1)^2 +(SE_2)^2}\)

Note: \(SE_1\) is the SE for group 1; and \(SE_2\) is the SE for group 2.

\(95\%CI=difference \space between \space samples \pm z \times (SE \space for \space a \space Difference)\)