2 Research questions
In this chapter, you will learn to:
- create operational and conceptual definitions.
- ask quantitative research questions.
- list and explain the various types of quantitative research questions.
- identify estimation and decision-making research questions.
- identify the variables implied by a quantitative research question.
- identify observational or experimental studies.
- describe and identify the units of analysis and units of observations in a study.
- communicate in the language of research and statistics.

2.1 Introduction
Asking clear and answerable research questions (RQs) is important, as the RQ impacts all other components of the research. Since quantitative research summarises and analyses the data using numerical methods (such as averages or percentages), the RQ must be appropriate for analysis using quantitative methods.
Studies often have an overall, broad research goal with many sub-questions (which may be quantitative or qualitative).
Example 2.1 (Research questions) Consider this broad research goal:
How well are permeable pavements (PPs) working in urban areas?
This goal has many component RQs (Fig. 2.1), and each can be answered separately.

FIGURE 2.1: A study of permeable pavements (PPs) may have many sub-questions
2.2 Definitions
Research studies usually include terms that must be carefully and precisely defined, so that others know exactly what has been done, without ambiguity. Two types of definitions can be given when necessary:
- A conceptual definition explains what is being studied (i.e., what a word or a term means in the study).
- An operational definition defines how something will be studied or measured.
Definition 2.1 (Conceptual definition) A conceptual definition articulates precisely what words or phrases mean; that is, what is being identified, measured, observed or assessed in a study.
Definition 2.2 (Operational definition) An operational definition articulates exactly how something will be identified, measured, observed or assessed.
In many cases, a clear operational definition will be needed to describe how data will be collected to ensure repeatability and consistent data collection, by removing any ambiguity about how data are obtained.

Example 2.2 (Operational and conceptual definitions) Consider a study examining stress in students. A conceptual definition would describe what is meant by 'stress' (in contrast to, say, 'anxiety'). An operational definition would describe how 'stress' would be measured, since stress cannot be measured directly (like height, for example).
'Stress' could be measured using a questionnaire or measuring physical characteristics, for instance. Other ways of measuring stress are also possible, and all have advantages and disadvantages.
Sometimes the definitions themselves are not important; a clear definition is simply needed. However, to avoid confusion, commonly-accepted definitions should be used unless good reasons exist for using a different definition. When a commonly-accepted definition does not exist, the definition being used should be very clearly articulated.
Example 2.3 (Operational and conceptual definitions) A research article (Gillet et al. 2018) entitled "Shoulder range of motion and strength in young competitive tennis players with and without history of shoulder problems" provided these necessary conceptual definitions (among others):
- Young: 8--15 years;
- Competitive tennis players: Some of the best players in their age category in France, and members of a French tennis centre of excellence.
An operational definition was provided for 'Shoulder strength': as measured using a hand-held dynamometer.
Players, administrators and fans are wary of concussions and head injuries in sport. A conference on concussion in sport developed this conceptual definition (McCrory et al. 2013):
... a complex pathophysiological process affecting the brain, induced by biomechanical forces...

However, an operational definition is needed to explain how to identify a player with concussion during a game. Rugby decided on this operational definition (Raftery et al. 2016):
... a concussion applies with any of the following:
The presence, pitch side, of any Criteria Set 1 signs or symptoms (table 1)... [Table 1 includes symptoms such as 'convulsion', 'clearly dazed', etc.];
An abnormal post game, same day assessment...;
An abnormal 36--48 h assessment...;
The presence of clinical suspicion by the treating doctor at any time...
Example 2.4 (Operational and conceptual definitions) Consider a study requiring water temperature to be measured.
An operational definition would explain how the temperature is measured: the thermometer type, how the thermometer was positioned, how long was it left in the water, and so on.
In contrast, a conceptual definition might describe the scientific definition of temperature (and would not be needed, as 'temperature' is a well-understood term).
A study of snacking in Australia (Fayet-Moore et al. 2017) used this conceptual definition of an 'eating occasion':
...one or more food or beverage items consumed at the same time of day...
and a 'snacking occasion' as
...one or more food or beverage items consumed at the same time of day within a snacking time period...
Finally then, 'snacking' was defined as:
Eating occasions that occurred during breakfast, midday and evening meals were meals and all eating occasions that occurred between these meals were classified as snacking.
These are all conceptual definitions, explaining what the terms mean.
An operational definition would explain how the data were obtained from the participants (e.g., using a food diary).
Meline (2006) discusses five studies about stuttering, each using a different operational definition:
- Study 1: As diagnosed by speech-language pathologist.
- Study 2: Within-word disfluences greater than 5 per 150 words.
- Study 3: Unnatural hesitation, interjections, restarted or incomplete phrases, etc.
- Study 4: More than 3 stuttered words per minute.
- Study 5: State guidelines for fluency disorders.
People may be classified as stutterers by some definitions but not others, so it is important to know which definition is used.
A study examined the possible relationship between the 'pace of life' and the incidence of heart disease (Levine 1990) in 36 US cities.
The researchers used four different operational definitions for 'pace of life' (remember the article was published in 1990!):
- The walking speed of randomly chosen pedestrians.
- The speed with which bank clerks gave 'change for two $20 bills or [gave] two $20 bills for change'.
- The talking speed of postal clerks.
- The proportion of men and women wearing a wristwatch.
None of these perfectly measure 'pace of life', of course. Nonetheless, the researchers found that, compared to people on the West Coast,
... people in the Northeast walk faster, make change faster, talk faster and are more likely to wear a watch...
--- Levine (1990) (p. 455)
Define a 'smoker'.
This is very difficult!
Some studies use the categories Never smoked, Past smoker, and Current smoker... or ask people to self-identify as a smoker or not.
2.3 Elements of RQs
A RQ must be written carefully so they can be answered effectively. This section introduces the four potential components of a quantitative RQ:
- The Population (Sect. 2.3.1);
- The Outcome (Sect. 2.3.3);
- The Comparison or Connection (Sect. 2.3.4);
- The Intervention (Sect. 2.3.5).
These form the POCI acronym (sometimes seen as the PICO acronym).
2.3.1 The population
All quantitative RQs study a population: a (usually large) group of interest in the study. Populations comprise individuals, sometimes called cases. If the individuals are people, they are sometimes called subjects.

Definition 2.3 (Population) The population is the group of individuals from which the total set of observations of interest could be made, and to which the results will (hopefully) generalise.
To fully understand individuals, you should also read about units of analysis and units of observation (Sect. 2.3.2). The individuals are the units of analysis.
The population is any group of individuals of interest; for example:
- all Australian males between 18 and 35 years of age.
- all bamboo flooring materials manufactured in China.
- all elderly females with glaucoma in Canada.
- all Pinguicula grandiflora growing in Europe.
The words population, individuals and cases do not just refer to people, though the words may be commonly used that way in general conversation.
The population is rarely the individuals from which the data are actually obtained. Indeed, all elements of the population are rarely accessible in practice. For example, testing if a new drug is effective cannot possibly study all people (especially people not yet born who might use the drug). The population is 'all people', not just those studied.
The population in a RQ is not just those studied; it is the whole group to which our results would generalise.
In contrast, a sample is the subset of the population from which data are obtained (Chap. 5).

Definition 2.4 (Sample) A sample is a subset of the population from which data are collected.
Example 2.5 (Samples) Consider a study of American college women, which aimed to:
...assess iron status [...] in highly active (>12 hr purposeful physical activity per week) and sedentary (<2 hr purposeful physical activity per week) women...
--- Woolf et al. (2009), p. 521.
The sample comprises 28 'active' and 28 'sedentary' American college women, from which data are collected. The population is all 'active and sedentary' American college women, not just the 56 in the study. The group of 56 subjects is the sample.
Completely and precisely defining the population sometimes requires refining or clarifying the population, using exclusion and/or inclusion criteria. Exclusion and inclusion criteria clarify which individuals may be explicitly included or excluded from the population.
Exclusion and inclusion criteria should be explained when their purpose is not obvious. Both exclusion and inclusion criteria are not necessary; none, one or both may be used.

Definition 2.5 (Inclusion and exclusion criteria) Inclusion criteria are characteristics that individuals must meet explicitly to be included in the study.
Exclusion criteria are characteristics that explicitly disqualify potential individuals from being included in the study.
Example 2.6 (Inclusion criteria) A study of a certain bird species may only include sites where with a confirmed sighting within the last two years.

Example 2.7 (Exclusion criteria) Concrete test cylinders with fissure cracks may be excluded from tests of concrete strength.
People with severe asthma may be excluded from exercise studies.
A study on the influenza vaccine (Kheok et al. 2008) listed the Population as 'health-care workers' (Kheok et al. 2008, 466), and the sample comprised healthcare workers at two specific hospitals. The population was refined using exclusion criteria: those
...declining to give consent, a history of egg protein allergy, and neurological or immunological conditions that are contraindications to the influenza vaccine.
--- Kheok et al. (2008), p. 466
Example 2.8 (Population and exclusion criteria) A study (Guirao et al. 2017) of the walking abilities of amputees used inclusion and exclusion criteria (Guirao et al. (2017), p. 27). Inclusion criteria included:
... length of the femur of the amputated limb of at least 15cm measured from the greater trochanter; use of the prosthesis for at least 12 months prior to enrollment and more than 6 h/day...
Exclusion criteria included:
... the presence of cognitive impairment hindering the ability to follow instructions and/or perform the tests; body weight over 100kg...
2.3.2 Units of observation and analysis
Units of observation and units of analysis are important, but similar, concepts that need to be distinguished to properly identify a population.
The individuals are the units of analysis.
Consider this RQ (based on Vaughn et al. (2009)):
In Australian 20-something men, is the average thickness of head hair strands the same for blond-haired men and brunet-haired men?
Comparing 100 hair strands from one blond-haired man, to 100 hair strands from one brunet-haired man, is problematic since only one man of each hair colour is represented. While there are 200 observations, only two people are compared; little is learnt about 20-something men in general. Instead, a lot is learnt about two specific men. The population is represented by just two men.

In this study, each individual hair is a unit of observation: the hair strands are what must be measured to obtain 'thickness of head hair strands'.
Definition 2.6 (Unit of observation) Unit of observation: The 'who' or 'what' which are observed, from which measurements are taken and data collected.
Since each blond hair comes from the same man, each of those hairs have essentially 'lived their life together': They are washed at the same time, with the same shampoo, exposed to the same amount of sunlight and exercise, share genetics, etc. However, different men would potentially use different shampoo, exercise differently, have different genetics, and so on.
The RQ aims to compare blond men with brunet men; men are being compared. Each man is a collection of units of observations (hair strands). This leads to a similar, but different, concept: the unit of analysis. In the example above, each man is a unit of analysis, where each unit of analysis gives 200 observations.
Definition 2.7 (Unit of analysis) Unit of analysis: The smallest collection of units of observations (and perhaps the units of observations themselves) about which generalizations and conclusions are made; the smallest independent 'who' or 'what' for which information is analysed.
In the hair-thickness study, each person is a unit of analysis. The sample size is just two. Each unit of analysis (man) has 100 units of observation (hair strands). Importantly, the sample size for the study is the number units of analysis; so here, only two examples of the population of men are in the study.
The size of the sample in a study is the number of units of analysis.
Sometimes. the units of analysis and units of observation are the same.
Example 2.9 (Units of analysis) In the hair-strand study, each hair strand is a unit of observation: measurements of hair strand thickness are taken from individual hair strands. The unit of analysis is the person: the hair strands from each man share much in common. 'Men' operate separately, but the hairs on each man are not separate entities.
Example 2.10 (Units of analysis) A study compares the wear on two brands of car tyres. Four tyres of Brand A are allocated to each of Cars 1--5, and four tyres of Brand B are allocated to each of Cars 6--10.
After 12 months, the amount of wear is recorded on each tyre. The unit of observation is the tyre: the amount of wear is measured on each tyre.
The tyres on any one car do not operate independently; the four tyres on a single car 'live their life together'. They all are exposed to the same day-to-day use, the same drivers, have driven almost identical distances, under the same conditions, etc.
The unit of analysis is the car: the brand of tyre is allocated to the car, and all wheels on the car get the same brand of tyre. Each unit of analysis (car) produces four units of observations.

Example 2.11 (Units of analysis) Consider comparing the percentage of females and males wearing sunglasses at a specific beach.
People in a group at the beach will probably not be operating 'independently': people with similarities tend to group together. For example, a couple will often both be wearing or both not wearing sunglasses; families will often all be wearing sunglasses or not wearing sunglasses.
The researchers have two options; either
- Use the people groups as the unit of analysis (some will be groups of one), and record data from just one person in any group. Ideally, the researchers would specify before-hand from which group member to take data (e.g., the person closest to the researchers when the group is noticed).
- Alternatively, the researchers may decide not to use data from groups at all, and only gather data from individuals.

A report on the Spectrum website reported:
Seven years ago, Peter Kind [...] was reading a study about fragile X syndrome, a developmental condition characterized by severe intellectual disability and, often, autism [...] Kind was surprised when he noticed a potentially serious statistical flaw.
The research team had looked at 10 neurons from each of the 16 mice in the experiment [...] the researchers had analyzed each neuron as if it were an independent [individual observation]. That gave them 160 data points to work with, 10 times the number of mice in the experiment.
`The question is, are two neurons in the brain of the same animal truly independent data points? The answer is no,' Kind says.
--- Spectrum report, accessed 18 Nov 2022
The study used 16 units of analysis (mice), but the authors treated the \(16\times 10 = 160\) neurons as the units of analysis. The 10 neurons from each mouse share the same genetic information
A total of 160 neurons from 16 mice is very different to a study of 160 neurons from 160 genetically-different mice.
The units of observation and units of analysis may be the same, and often are. However, they are sometimes different, and identifying these situations is crucial. Importantly, studies compare units of analysis, not units of observation.
Example 2.12 (Units of analysis) A study compared two school physical activity (PA) programs. Each of 44 children (with parental agreement) were allocated to one of two PA programs. The improvement in children's fitness was measured for every student in the study over six months.
The units of observation are the individual students, as the fitness measurements are taken from the students. The units of analysis are also the individual students, as the PA program was allocated to each student individually, and each student has their own sport, routines, etc. Each unit of analysis (student) has one unit of observation.
Units of observation: the individual students, as the fitness measurements are taken from the students individually.
Units of analysis: the schools, as the PA program was allocated to each school. All students at School A are exposed Program 1, but all students at School A are also likely to be exposed to similar weather, fitness opportunities, physical conditions, teachers and school-based philosophies, and so on.
The improvement in the children's fitness levels and the program are both variables.
The following short video may help explain some of these concepts:
2.3.3 The outcome
All RQs study something about the population, called the outcome. Because the RQ concerns a population, the outcome describes a population (not individuals). Hence, the outcome is usually an average, percentage, or general numerically quantity summarising the population.

Definition 2.8 (Outcome) The outcome in a RQ is the result, output, consequence or effect of interest in a study, numerically summarising the population.
The outcome of interest in a population may be (for example):
- average increase in heart rates after exercise.
- average amount of wear after 1000 hours of use.
- proportion of people whose pupils dilate.
- average weight loss after three weeks.
- percentage of seedlings that die.
The outcome in a RQ summarises a population; it does not describe the individuals in the population.
2.3.4 The comparison or connection
Some RQs may seek to establish a relationship in the Population between the Outcome and another attribute of the individuals. This is other attribute is called a Comparison or Connection. The implication is that a change in the value of the comparison or connection may be associated with a change in the value of the outcome (which may or may not be a cause-and-effect relationship).
A comparison refers to an attribute recorded in a small number of distinct groups for which the outcome is compared. A connection refers to a attribute that can take many different values for which a connection with the outcome is explored. The values of the comparison or connection may be imposed on the individuals by the researchers, or may already exist in the individuals.
Definition 2.9 (Comparisons) The comparison in the RQ identifies the small number of different, distinct subsets of the population between which the outcome is compared.
Definition 2.10 (Connections) The connection in the RQ identifies another attribute of the individuals that can take many different values, and may be related to the outcome.
Example 2.13 (Comparisons and connections) A study (Stern et al. 2021) examined the mean daily sodium excretion (the Outcome) in Israeli adults (the Population).
The daily sodium excretion was compared for those diagnosed with diabetes, and those not diagnosed with diabetes.
A possible connection was explored between the daily sodium excretion and the systolic blood pressure.

The distinction between between-individuals comparisons (being discussed here) and within-individuals comparisons is important.
Definition 2.11 (Between-individuals comparisons) Between-individuals comparisons mean that the comparison is between different groups of individuals.
In contrast, within-individual comparisons make comparison within the same individuals.

Example 2.14 (Between- and within-individual comparisons/connections) Consider studying the strength of left and right legs of football players.
A between-individuals comparison would compare the left and right leg strengths between different groups: one group would have their left-leg strength measured, and the other their right-leg strength measured.
In contrast, a within-individuals comparison would measure both the right and left strength within the same individuals: the comparison is within each individual.
The C refers to between-individual comparisons or connections.
The outcome may be compared between two or more separate subsets of the population; for example:
- Comparing the average amount of wear in floor boards (O) between two groups: standard wooden flooring materials, and bamboo flooring.
- Comparing the average heart rates (O) across three subsets: those who received no dose of a drug, those who received a daily dose of the drug, and those who received a twice-daily dose of the drug.
Explicitly, the comparison here is between individuals, not within individuals.
Be careful!
The definition of a comparison refers to a between-individuals comparison, that may be imposed (for example, one group is given one dose of fertilizer per day, and another given two doses of fertilizer per day) or existing (for example, one group of people aged under 30, and another aged 30 or over).
If all individuals are treated in the same way, or do not have existing difference that allow them to be divided into groups to be compared, no comparison exists according to this definition.
Example 2.15 (Comparison) Consider comparing the average blood pressure (the Outcome) in the right and left arms of Australians (the Population). The blood pressure is measured on both arms of every studied individual.
There is no (between-individuals) comparison: the individuals are not divided into separated groups to compare average blood pressure; every person is treated the same way. This is a "within-individuals" comparison.
The outcome might be best described as 'the average difference between right- and left-arm blood pressure'.
A study comparing the average blood pressure between (a) people aged under 40, and (b) people aged 40 or over does have a (between-individuals) comparison: two subsets (under 40; 40 and over) of the population (Australians) are compared.
As the value of the connection changes, the value of the outcome (potentially) changes; for example:
- The connection between average heart rate (O) and exposure to various caffeine doses (C).
- The connection between percentage germination (O) and hours of sunlight per day (C).
2.3.5 The intervention
RQs with a connection or comparison (C) sometimes also have an intervention.

Definition 2.12 (Intervention) An intervention is a comparison or connection whose value can be manipulated by the researchers. That is, the researchers impose the connection or comparison upon the individuals in the study.
The intervention may be:
- explicitly giving doses of a new drug to patients.
- explicitly applying wear testing loads to two different flooring materials.
- explicitly exposing people to different stimuli.
- explicitly applying a different dose of fertiliser.

Example 2.16 (Intervention) A study by Bird et al. (2008) gave participants a diet using refined flour, or a diet using a new flour variety (Himalaya 292). The type of diet is the comparison. Since the researchers can manipulate which subject ate which flour, this study has an intervention.
Example 2.17 (Interventions) A study comparing the average blood pressure (O) in female and male (C) Australians (P) measured blood pressure using a blood pressure machine (a sphygmomanometer).
The research team needs to interact with the participants and use the machine to measure blood pressure, but there is no intervention. Using the sphygmomanometer is just a way to measure blood pressure, to obtain the data.
There is no intervention: the comparison is between females and males, which cannot be manipulated or imposed on the individuals by the researchers.
A study of American college women (Woolf et al. 2009) measured 'iron status' in highly active women (>12 hr purposeful physical activity per week) and sedentary women (<2 hr purposeful physical activity per week). In this study, what is the Outcome; Comparison or Connection (if any), and Intervention (if any)?
Outcome: 'average iron status' (which would need an operational definition.)
Comparison: between two groups of individuals: highly active and sedentary women (i.e., between individuals). These terms would also need operational definitions!
Intervention: Probably none; an intervention would mean the researchers tell each individual woman to be highly active or sedentary, which seems unlikely.
Researchers examined numerous studies of chest compressions by paramedics. They examined research papers in which the Population was patients who had experienced a cardiac arrest, and where manual chest compressions were compared with another method.
The table below shows the interventions and outcomes of interest:
Interventions | Outcomes |
---|---|
* Mechanical chest compression | * Mean survival time to hospital discharge |
* Mechanical CPR | * Percentage with a return of spontaneous circulation (ROSC) |
* Powered chest compressions | |
* Powered CPR |
The research concluded that:
Overall, the evidence analysed suggests that mechanical chest compression devices are statistically superior to manual chest compressions of a high quality, when up-to-date protocols and guidelines are followed.
--- P. Williams, Goring, and Franklin (2021), Table 1
2.4 Types of RQs
All RQs have a population (P) and an outcome (O). Different types of RQ emerge depending on whether the RQ has a comparison/connection (C) and whether this comparison or connection can be manipulated (an intervention (I)). This section studies different types of research questions:
These are compared in Sect. 2.4.5. RQs can also be written with one of two purposes in mind:
- Estimation: These RQs ask how precisely a value in the population is estimated by using the sample, and are answered using confidence intervals.
- Making decisions: These RQs are concerned with making a decision about a population, and are answered using hypothesis testing.
Examples for both forms are given for the different types of RQs below.
2.4.1 Descriptive RQs (PO)
Descriptive RQs are the most basic RQs, giving the Population to be studied, and the Outcome of interest about this population. Typically, descriptive RQs have one of these forms:
- Estimation: Among {the population}, what is {the outcome}?
- Decision-making: Among {the population}, is {the outcome} equal to {some value}?
These are not 'recipes', but guidelines.

Example 2.18 A study examined the 'body temperature of 148 healthy men and women' (Mackowiak, Wasserman, and Levine 1992) aged between 18 to 40 (the P). One descriptive RQ was:
What is the mean body temperature?
This RQ is an estimation RQ. A decision-making RQ they also studied was whether the average body temperature was the value that had been commonly accepted:
Is the mean body temperature really 98.6oF?
Consider this RQ: 'Among Indonesian adults, what proportion are coeliacs?' For this RQ, identify the Population and the Outcome.
P: Indonesian adults; O: The proportion that are coeliacs. This is a estimation-type descriptive RQ.
2.4.2 Relational RQs (POC)
Usually, studying relationships are more interesting than simply describing a population. Relational RQs explore existing relationships, and state the Population, the Outcome, and the Comparison or Connection. Relational RQs have no intervention; the connection or comparison is not manipulated by, nor imposed by, the researchers.
Typically, relational RQs with a comparison have one of these forms:
- Estimation: Among {the population}, what is the difference in {the outcome} for {the groups being compared}?
- Decision-making: Among {the population}, is {the outcome} the same for {the groups being compared}?
Typically, relational RQs based on a connection have the form:
- Estimation: Among {the population}, how strong is the relationship between {the outcome} and {something else}?
- Decision making: Among {the population}, is {the outcome} related to {something else}?

Example 2.19 (Relational RQ) Consider this RQ (based on Estévez-Báez et al. (2019)):
Among Cubans between 13 and 20 years of age, is the average heart rate the same for females and males?
The population is 'Cubans 13 and 20 years of age', the outcome is 'average heart rate', and the (between-individuals) comparison is 'between females and males'.
This is a relational RQ since the sex of the individual (the C) is not manipulated by, or imposed by, the researchers. This RQ is a decision-making RQ, since it asks if the average heart rate is the same for females and males. An estimation-type relational RQ would ask about the size of difference in the average heart rate between females and males.
The same study could also have asked:
Among Cubans between 13 and 20 years of age, is the average heart rate related to age?
The connection is with 'age', which cannot be manipulated by the researchers, so this is a relational RQ. This RQ is a decision-making RQ, since it asks if the average heart rate is related to age. An estimation-type relational RQ might be:
Among Cubans between 13 and 20 years of age, how strong is the relationship between average heart rate and age?
Consider this RQ (based on Brown et al. (2000)):
In the Queensland Ambulance Service last year, what was the difference between the average response time to emergency calls between weekdays and weekends?
Identify the population, outcome, and comparison.
Consider this RQ (based on Maron (2007)):
In Queensland state forests, is there a relationship between the average number of noisy miners and the number of eucalypts, in general?
(A noisy miner is a type of bird.) Identify the population, outcome, and comparison.

2.4.3 Interventional RQs (POCI)
Interventional RQs explore relationships where the comparison/connection can be manipulated by, or imposed by, the researchers. Interventional RQs state the Population, the Outcome, the Comparison or Connection, and use an Intervention.
Interventional RQs are like relational RQs, except that the comparison or connection is manipulated by the researchers (i.e., has an intervention). Sometimes, the use of an intervention is unclear from the RQ. When writing an interventional RQ, clarify if an intervention is used if possible.

Example 2.20 (Interventional RQ) A study (Khair et al. 2015) compared the time needed for organic waste to turn into compost, when earthworms were either added or not added to the waste. Since the researchers manipulated which waste samples had earthworms added, the study uses an intervention, and research question is interventional.
An estimation-type RQ could be used to estimate the composting time. A decision making-type RQ could be used to compare the composting times for waste with and without earthworms added.
Consider this RQ (McLinn et al. 1994):
In children with acute otitis media, what is the difference in the average duration of symptoms when treated with cefuroxime compared to amoxicillin?
The population is 'children with acute atitis media', the outcome is 'average duration of symptoms', and the comparison is between two groups (taking 'cefuroxime' or 'amoxicillin').
If the drugs are given to the children by the researchers, the RQ has an intervention. If the researchers find children who are already taking the two drugs and measure the outcome ('average duration of symptoms'), the RQ has no intervention.
2.4.4 Two purposes of RQs
As noted earlier, RQs can also be written with one of two purposes in mind:
-
Estimation: These RQs ask how precisely a value in the population is estimated by using the sample, and are answered using confidence intervals. Answering estimation RQs are discussed for:
- descriptive RQs (Chaps. 20 to 23);
- relational or interventional RQs with a comparison (Chaps. 24 and 25), where the value being estimated measures the difference between groups; and
- relational or interventional RQs with a connection (Chap. 37.6) where the value being estimated measures the strength of a relationship.
-
Making decisions: These RQs are concerned with making a decision about a population, and are answered using hypothesis testing. Answering decision-making RQs are discussed for:
- descriptive RQs (Chaps. 28 to 31);
- relational or interventional RQs with a comparison (Chaps. 32 and 33), where the decision is about the difference between groups; and
- relational or interventional RQs with a connection (Sects. 37.7 and 36.4) where the decision is about the strength of a relationship.
Example 2.21 (Various types of RQs) Studies can incorporate many types of RQs. For example, a study (Thane, Bates, and Prentice 2004) of 'British young people aged 4--18' (the P) asked and answered numerous RQs. Two descriptive RQs were:
- What is the average zinc intake of the children? This is an estimation RQ.
- Does the average zinc intake meet recommended dietary guidelines? This is a decision-making RQ.
Two relational RQs were:
- What is the strength of the association between plasma zinc and retinol concentrations? This is an estimation RQ, estimating the strength of the relationship.
- Is the average zinc intake the same for boys and girls? This is an decision-making RQ.
Decision-making RQ have two possible answers. For the example above: either the zinc intake is the same for females and males, or is not the same for females and males (Fig. 2.2). However, answers are rarely clear in practice. Instead, researchers decide how much sample evidence support a particular hypothesis about the population.
Evidence may support or contradict a hypothesis; evidence rarely proves a hypothesis (at least, without any other support, such as theoretical support). Ultimately, after collecting data from a sample, a decision must be made about which explanation about the population is more consistent with the data collected.

FIGURE 2.2: Two possible answers to the RQ about zinc intake
2.4.5 Contrasting the types of RQs
Descriptive RQs are usually used when initially exploring a research topic. Relational RQs then examine relationships, and provide an understanding of how the outcome is related to certain groups in the population, which may set the platform for asking interventional questions. Interventional RQs (when possible) can be used to test theories or models, or to establish cause-and-effect relationships (i.e., causality); see Table 2.1. Research often develops through these stages of RQs as knowledge grows and develops.
RQ type | P | O | C | I |
---|---|---|---|---|
Descriptive (D) | Yes | Yes | ||
Relational (R) | Yes | Yes | Yes | |
Interventional (I) | Yes | Yes | Yes | Yes |
Example 2.22 (Different study types) Coeliac disease in an inherited auto-immune condition where individuals have an intolerance to the gluten found in wheat, rye, barley and some other grains. Initially, the proportion of individuals suffering from coeliac disease was unknown (e.g., Kenwright (1972)). Descriptive RQs were answered to make estimates (e.g., Cook et al. (2000)).
Then, relational RQs compared the proportion of females and males who were coeliacs (e.g., Cook et al. (2000)). Then, interventional RQs were posed to understand the disease further; for example, if the percentage with adverse symptoms is the same for those given a diet without oats and those given a diet with oats (e.g., Janatuinen et al. (2002); Lundin et al. (2003)).
- Relational: A between-individuals comparison exists (comparing amputees with transradial and transhumeral amputations), but would not be manipulated or imposed by the researchers.
- Interventional: The RQ has a between-individuals comparison ('comparing trees planted in a concrete sidewalk and a grassed sidewalk'), and the wording sounds like these trees were planted here intentionally. The researchers could pick a trees, and decide where to plant it (concrete or grassed sidewalk).
2.5 Writing RQs: an example

RQs emerge from observations, which lead to questions, and the need for evidence to answer that question. Suppose you notice some people taking echinacea when they get a cold. You may wonder: is there evidence that echinacea helps with a cold? This may lead to an initial RQ:
Is it better to take echinacea when you have a cold?
This RQ is clearly poor, but serves as a starting point. This RQ can be refined by clarifying the POCI elements. For example, what population could we study? Many options exist: All Australians, or Australian adults with a specific "cold". Some of these may not be practical (i.e., when a sample cannot easily be obtained from the population).
What outcome could be used to to determine echinacea's effectiveness? Many options exist, such as: Average cold duration, or the percentage of people who take days off work.
The initial RQ is also vague: better than what? The outcome could be compared between groups (such as those taking echinacea and those who do not), or connected to something else (such as the daily dose of echinacea).
We could also decide to intervene or not, which has implications for how the study is conducted and how the results are interpreted. If we decided not to intervene, the subjects decide for themselves how to treat their cold. If we did decide to intervene, various interventions could be used; the dose frequency or the doses amounts could be imposed.
After making some decisions about P, O, C and I, a revised RQ (based on Barrett et al. (2010)) is:
Among Australian teenagers with a common cold, is the average duration of cold symptoms shorter for teens taking a daily dose of echinacea compared to teens taking no medication?
The P, O, C and I do not have to be comprehensively described in the RQ; some information could be provided later as operational definitions (e.g., dose) and using exclusion and inclusion criteria (e.g., exclude teens with chronic health conditions).
The following short video may help explain some of these concepts:
2.6 Variables: from populations to individuals
Consider this RQ seen above:
Among Australian teenagers with a common cold, is the average duration of cold symptoms shorter for teens given a daily dose of echinacea compared to teenagers given no medication?
This is an interventional RQ about a population. The data to answer this RQ come from a sample of individuals in the population. Each piece of information obtained from or about each individual is called a variable, because the values can vary from individual to individual.
Definition 2.13 (Variable) A variable is a single aspect or characteristic associated with the individuals, whose values can vary from individual to individual.
Examples of variables include: the duration of cold symptoms, gender, place of birth, amount of tyre wear, or hair colour. The RQ identifies the variables needed to answer the RQ, though other variables may be (and typically are) measured also (Sect. 6.4).
A variable is a single aspect that can vary from individual to individual. While your city of birth may not change, 'city of birth' is a variable because it can vary from individual to individual.

Example 2.23 (Variables) 'Duration of cold symptoms' is a variable: it is obtained from individuals, and its value can vary from individual to individual.
The 'average duration of cold symptoms' is the outcome, a numerical summary of many individuals' cold durations.
While many variables can be measured on individuals, two essential variables are (Table 2.2):
- The response variable measures, assesses, describes or records information to determine the outcome.
- The explanatory variable measures, assesses, describes or records information to determine the comparison or connection.
![]() |
![]() |
|
Population | Individuals | |
Outcome: | \(\rightarrow\) | Response variable |
Comparison/Connection: | \(\rightarrow\) | Explanatory variable |
The RQ cannot be answered without information about these two variables. The outcome refers to the numerical summary of the values of the response variable (Table 2.3) from many individuals.
Definition 2.14 (Response variable) A response variable is a variable used to identify, measure, assess or describe the outcome on individuals in the population.

FIGURE 2.3: The POCI elements
![]() |
\(\rightarrow\) |
![]() |
Population | \(\rightarrow\) | Individuals |
Average increase in diastolic blood pressure | \(\rightarrow\) | Increase in diastolic blood pressure of individuals() before and after exercise |
Percentage of seedings that sprout | \(\rightarrow\) | Whether or not an individual seedling sprouts |
Proportion owning iPad | \(\rightarrow\) | Whether or not an individual owns an iPad |
Average cold duration | \(\rightarrow\) | Cold duration for individuals |
Percentage of concrete cylinders having fissures | \(\rightarrow\) | Whether or not an individual cylinder has fissures |
Definition 2.15 (Explanatory variable) An explanatory variable is a variable used to identify, measure, assess or describe the comparison or connection on individuals in the population.
The explanatory variable allows the comparison or connection to be identified (Table 2.4).
Comparison being made | Explanatory variable in individuals | |
---|---|---|
Between males and females | \(\rightarrow\) | The sex of each person |
Between beech and bamboo floor boards | \(\rightarrow\) | Type of floorboard in each home |
Between 300kg/ha, 350kg/ha and 400kg/ha fertilizer rates | \(\rightarrow\) | Application rate in each paddock |
Between people in their 20s, 30s and 40s | \(\rightarrow\) | Age group for each person |
Response variables are sometimes called dependent variables, and explanatory variables are sometimes called independent variables.
The value of the response variable may change in response to the value of the explanatory variable. The value of the explanatory variable may explain the value of the response variable.
Example 2.24 (Variables) For the final RQ for the echinacea study (Sect. 2.5), 'the duration of cold symptoms' is the response variable, and 'type of medication (echinacea; or none)' is the explanatory variable. The type of medication is taken before the cold symptoms disappear, and may even explain the duration of the cold symptoms.
The Population is 'carrots grown in Buderim' 8 weeks after planting. From these carrots, we need to collect Whether or not Thrive was applied and the weight of the carrots 8 weeks after planting.
The response variable is 'the weight of each individual carrot 8 weeks after planting', and the explanatory variable is 'whether or not Thrive was used on each carrot'.
('The number of carrots planted' is not even a variable: it is not information recorded about the individuals, but a summary of information.)
Consider this RQ:
For overweight men over 60, is the average weight loss after three weeks the same for a diet high in fresh fruit and a diet high in dried fruit?
The outcome is the average weight loss; the response variable is the weight loss for each individual man. (This would be found by measuring their weight before and after three weeks on the diets; it is measured within-individuals.)
The between-individuals comparison is between the two diets; the explanatory variable is the diet each man is on.
Example 2.25 (POCI) Consider this RQ (based on Tudor-Locke, Barreira, and Schuna Jr (2015)):
For Australian adults, is the average number of steps per day the same for those using a waist accelerometer, and those using a wrist accelerometer?
The population is the group being studied: all Australian adults. The outcome is the average number of steps per day, and the response variable is the number of steps per day for the individuals.
The comparison is "Between those wearing a waist accelerometer and those wearing a wrist accelerometer". The explanatory variable is what varies from person to person: "The location of the accelerometer".
This RQ may or may not have an intervention. If the researchers manipulate (i.e., tell people) where to wear the accelerometer, an intervention is present. If the individuals have chosen themselves where to wear the accelerometer, an intervention is not present.
2.7 Preparing software
Most statistical software packages (including jamovi and SPSS) use the same approach for organising the data (though exceptions exist for some types of analyses):
- Each row represents one unit of analysis: the number of rows equals the number of units of analysis.
- Each column represents one variable: the number of columns equals the number of variables. (A column of identifying information may also appear, such as the person's name, or concrete batch number.)
In statistical software, the variable names are not placed in a row (say, in Row 1 above the data itself), which might happen when using a spreadsheet. The names of the variables are the names of the columns.
Example 2.26 (Preparing statistical software) In Sect. 2.6, a RQ was asked about whether using echinacea or not reduced the duration of the common cold.
For this RQ, the variables are 'Duration of cold symptoms' (response), and 'Type of treatment' (explanatory); see (Examples 2.23 and 2.24). To set up the software for data entry:
- The number of rows of data is the number of people in the study.
- The number of columns is two: one column to record the duration of each individual's cold symptoms, and the other to record whether the individual received a dose of echinacea or received no medication. There may be a column recording the name of each individual.
The variable names (say, Duration
and Treatment
) are the columns names (Fig 2.4).


FIGURE 2.4: jamovi (left) and SPSS (right) prepared for the data, with some data entered, and the variable names as the column headers
2.8 Summary
In this chapter, you have learnt to write research questions for quantitative analysis. Research questions (RQs) are always about an outcome (O) in some population (P). Some RQs have a comparison or connection (C); some also have an intervention (I): When the values of C can be manipulated by the researchers.
By comparison, we mean "between-individuals" comparisons. RQs may be descriptive (with only a P and O), relational (with a P, O and C), or interventional (with a P, O, C and I).
RQs may take one of two forms: Estimation-type RQs; or Decision-making-type RQs.
For quantitative RQs, the outcome numerically summarises the population (or subsets of the population), so is usually worded in terms of percentages, averages, etc.
Data comes from individuals in the population by measuring, observing or assessing the response (or dependent) variable. The Outcome is a numerical summary of the values of the response variable from many individuals. Similarly, the data concerning the comparison or connection comes from measuring or observing the values of the explanatory (or independent) variables from individuals.
The who or what that observations are made from are called the units of observation. The smallest independent collections of units of observations (that is, units with very little in common) are called the units of analysis.

FIGURE 2.5: Chapter 2 summary
The following short video may help explain some of these concepts:
2.9 Quick review questions
Consider this RQ:
In female elite netball players, do players in defence positions have a greater average number of knee injuries (per player per season) than players in attacking positions?
- What is the comparison in this RQ, if any?
- What is the outcome?
- What is the response variable?
- What is the unit of analysis?
- What type of research question is this?
- In what form is this RQ?
2.10 Exercises
Selected answers are available in Sect. D.2.
Exercise 2.1 In a study of public acceptance of alternative water supplies (Hurlimann and Dolnicar 2016), various water sources are defined. In Table 2.5, match the term with the appropriate conceptual definition.
Term | Definition |
---|---|
Rainwater | Rainwater from a rainwater collection tank on your property |
Bottled | Water you presently use throughout your dwelling (home) |
Tap | Highly purified seawater deemed by scientists and public health officials as safe for human consumption. |
Recycled | Highly purified wastewater deemed by scientists as safe for human consumption |
Desalinated | Water sold in bottles by food companies that is widely available to the public for purchase and consumption |
Exercise 2.2 Consider this RQ:
Among university students, is the average resting diastolic blood pressure the same for students who regularly drive to university and those who regularly ride their bicycles to university?
- For this RQ, identify the population.
- For this RQ, identify the outcome.
- For this RQ, identify the comparison, if any.
- For this RQ, identify the intervention, if any.
- What type of RQ is this?
- What operational and conceptual definitions would be needed?
- What information must be collected from each individual to answer the RQ?
- What are the units of analysis?
- What are the units of observation?
Exercise 2.3 Consider this article extract (Checkley et al. (2002), p. 210):
We conducted a 4-year (1995--1998) field study in a Peruvian peri-urban community... to examine the relation between diarrhoea and nutritional status in 230 children \(<3\) years of age
For this study:
- Identify POCI.
- Infer the primary research question.
- What type of question is used?
- What operational definitions would be needed?
- What are the response and explanatory variables?
Exercise 2.4 For the following response variables, what would be the corresponding outcomes?
- Whether a vehicle crashes or not.
- The height at which people can jump.
- The number of tomatoes per plant.
- Whether or not a person owns a car.
Exercise 2.5 Consider this RQ: 'Is the average walking speed the same when texting and talking on a mobile phone?'
- What is the explanatory variable?
- What is the response variable?
- What is the outcome?
Exercise 2.6 For the following comparisons, what would be the corresponding explanatory variables?
- Between 91 octane, 95 octane, and ethanol-blended car fuel.
- Between caffeinated and decaffeinated coffee.
- Between taking zero, one or two iron tablet per day.
- Between vegans and vegetarians.
Exercise 2.7 For the following studies, determine which have a comparison and which do not. In each case, identify the outcome.
- A study to determine if a higher percentage of people at a particular city park wear hats in winter compared to summer.
- A study to determine if average cholesterol levels are the same when measured on the same people before and after a diet change.
- A study to determine if the average balance-time on right legs is the same as on left legs.
- A study to determine if the average yield of tomato plants is the same when three different fertilisers are applied.
Exercise 2.8 Animals in an experiment are divided into pens (three per pen), and feed is allocated to each pen (Sterndale et al. 2017). Animals in different pens receive different feed; animals in the same pen receive the same feed. The weight gain of each animal is recorded.
- What is the unit of observation? Why?
- What is the unit of analysis? Why?
Exercise 2.9 Consider this actual student RQ from the university where I work.
Among 10 Australian adults, does the time taken to read a passage of text change when different fonts are used?
Critique the RQ, and write a better RQ (if necessary).
Exercise 2.10 Consider this actual student RQ from the university where I work.
Of students that study at (a University), do males have a larger lung capacity than females?
Critique the RQ, and write a better RQ (if necessary).
Exercise 2.11 Consider this RQ:
For Australian adults with a common cold, do people who take Vitamin C tablets have, on average, a shorter cold duration than people who do not take any Vitamin C tablets?
In this RQ, the population is , and the outcome is .
The is between those who take Vitamin C tablets and those who do not.
The variable is the duration of the cold symptoms for each individual person. The variable is whether or not each person take Vitamin C tablets or not.
It is not clear whether there is an intervention. If there is an intervention, this would be RQ, otherwise it would be RQ.
Exercise 2.12 A research study was comparing the average size of Blue Gum eucalypt leaves in two areas of Queensland. A student takes 40 leaves from each of ten trees in Area A, and 40 leaves from each of ten trees in Area B. Are the following statements true or false?
- The unit of analysis is the individual leaf.
- The unit of observation is the individual leaf.
- The unit of analysis is the tree.
- The size of the sample in the study is