# 2 Research questions

In this chapter, you will learn to:

• create operational definitions.
• list, explain and give examples of the various types of quantitative research questions.
• identify estimation and decision-making research questions.
• identify the variables implied by a quantitative research question.
• identify observational or experimental studies.
• describe and identify the units of analysis and unit of observations in a study.
• communicate in the language of research and statistics.

## 2.1 Introduction

In research, asking clear and answerable research questions (RQs) is important. The data (evidence) that must be collected depends on the RQ.

In quantitative research, summarising and analysing the data typically uses numerical methods (such as averages or percentages), so the RQs must be appropriate for analysis using these methods.

For this reason, writing the RQ appropriately is important. The RQ drives all other aspects of the research (Fig. 2.1).

Defining the RQ precisely can be challenging. Studies often have an overall, broad research goal with many sub-questions (which may be quantitative or qualitative).

Example 2.1 (Research questions) Consider this broad research goal:

How well are PPs (permeable pavements) working in urban areas?

This goal has many component RQs (Fig. 2.2), and each can be answered using separate studies.

## 2.2 Definitions

Research studies usually include terms that must be carefully and precisely defined, so that others know exactly what has been done and there are no ambiguities. Two types of definitions can be given:

• A conceptual definition explains exactly and precisely what is being measured, observed or assessed (i.e., what a word or a term means in the study).
• An operational definition defines exactly how something will be identified, measured, observed or assessed.

Not all variables in a study will require a conceptual definition. However, most variables will require an operational definition to ensure consistent data collection, by removing any ambiguity about how a variables is measured.

Definition 2.1 (Conceptual definition) A conceptual definition articulates exactly and precisely what is being measured, observed or assessed in a study.

Definition 2.2 (Operational definition) An operational definition articulates exactly how something will be identified, measured, observed or assessed.

Example 2.2 (Operational and conceptual definitions) Consider a study examining stress in students during a university semester.

A conceptual definition would clearly describe what is meant by 'stress' (in contrast to, say, 'anxiety').

An operational definition would describe how 'stress' would be measured. While this is always important, it is especially important here since stress cannot be measured directly (like height can, for example).

'Stress' could be measured using a survey (like the Perceived Stress Scale (PSS));50 the level of stress is the score on the ten-question PSS.

Other means of measuring stress are also possible (such as heart rate or blood pressure).

Sometimes the definitions themselves aren't important, provided a clear definition is given. Sometimes, commonly-accepted definitions exist, so should be used unless there is a good reason to use a different definition (for example, in criminal law, an 'adult' in Australia is someone aged 18 or over).

Sometimes, a commonly-accepted definition does not exist, so the definition being used should be very clearly articulated.

Example 2.3 (Operational and conceptual definitions) A student project at my university used this RQ:

Amongst students [...], on average do student who participate in competitive swimming have greater shoulder flexibility than the remainder of the able-bodied USC student population?

Shoulder flexibility needs a conceptual definition to describe exactly what it means, so that everyone involved in the study, or reading the study conclusions, has the same understanding of the term.

Additionally, how shoulder flexibility is being measured is not clear. An operational definition is needed (which the student did not provide...).

Example 2.4 (Operational and conceptual definitions) Players and fans have become more aware of concussions and head injuries in sport. A Conference on concussion in sport developed this conceptual definition:51

Concussion is a brain injury and is defined as a complex pathophysiological process affecting the brain, induced by biomechanical forces...

While this is helpful... it does not explain how to identify a player with concussion during a game.

Rugby decided on this operational definition:52

... a concussion applies with any of the following:

1. The presence, pitch side, of any Criteria Set 1 signs or symptoms (table 1)... [Note: This table includes symptoms such as 'convulsion', 'clearly dazed', etc.];

2. An abnormal post game, same day assessment...;

3. An abnormal 36--48 h assessment...;

4. The presence of clinical suspicion by the treating doctor at any time...

Example 2.5 (Operational and conceptual definitions) Consider a study requiring water temperature to be measured.

An operational definition would explain how the temperature is measured: the thermometer type, how the thermometer was positioned, how long was it left in the water, and so on.

In contrast, a conceptual definition might describe the scientific definition of temperature (and would not be needed, as 'temperature' is a well-understood term).

Define a 'smoker'.

## 2.3 Elements of RQs

A RQ must be written carefully so it can be properly answered. In this section,/ the four potential components of a RQ are studied:

These form the POCI acronym.

### 2.3.1 The Population

All RQs study some population: the larger group of interest in the study.

Definition 2.3 (Population) The population is the group of individuals (or cases; or subjects if the individuals are people) from which the total set of observations of interest could be made, and to which the results will (hopefully) generalise.

In this context, a population is any group of interest; for example:

• all Australian males between 18 and 35 years of age.
• all bamboo flooring materials manufactured in Queensland.
• all elderly males with glaucoma in Canada.
• all Pinguicula grandiflora growing in Europe.

The words population, individuals and cases do not just refer to people, though the words may be commonly used that way in general conversation.

The population is not just those individuals from which the data are actually obtained. Indeed, all these elements of the population may not be accessible in practice.

The population represents all the 'individuals' to which the results are to be generalised. For example, when testing a new drug, the aim is to see if it works on people in general, including people not yet born. The population is 'all people'.

The population in a RQ is not just those we end up studying. It is the whole group to which our results would generalise.

In contrast, the sample is the subset of the population that we actually end up studying, from which data are obtained.

Definition 2.4 (Sample) A sample is a subset of the population of interest which is actually studied, and from which data are collected.

Example 2.6 (Samples) Consider a study of American college women, which aimed to:

...assess iron status [...] in highly active (>12 hr purposeful physical activity per week) and sedentary (<2 hr purposeful physical activity per week) women...

--- p. 521.

The sample comprises 28 'active' women and 28 'sedentary' American college women, from which data are collected.

The population is all 'active and sedentary' American college women, not just the 56 in the study. The group of 56 subjects is the sample.

Completely defining the population54 sometimes requires refining or clarifying the population, using exclusion and/or inclusion criteria.

Exclusion and inclusion criteria clarify which individuals may be explicitly included or excluded from the population.

Exclusion and inclusion criteria should be explained when their purpose is not obvious. Both exclusion and inclusion criteria are not needed; none, one or both may be used.

Definition 2.5 (Inclusion criteria) Inclusion criteria are characteristics that individuals must meet explicitly to be included in the study.

Example 2.7 (Inclusion criteria) A study of a certain bird species may only include sites where there has been a confirmed sighting within the last two years.

A study of weight-loss methods may require people over a certain weight.

Definition 2.6 (Exclusion criteria) Exclusion criteria are characteristics that explicitly disqualify potential individuals from being included in the study.

Example 2.8 (Exclusion criteria) Concrete test cylinders with fissure cracks may be excluded from tests of concrete strength.

People with severe asthma may be excluded from exercise studies.

Example 2.9 (Population and exclusion criteria) A study on the influenza vaccine55 listed the Population as 'health-care workers',56 and the sample they studied was:

All healthcare workers at the National University Hospital (NUH) and KK Women’s and Children’s Hospital (KKWCH)...

--- p. 466

The population was refined by exclusion criteria. The exclusion criteria were those:

...declining to give consent, a history of egg protein allergy, and neurological or immunological conditions that are contraindications to the influenza vaccine.

--- p. 466

### 2.3.2 The Outcome

All RQs study something about the population, called the outcome.

Because the RQ concerns a population, the outcome describes a population as a whole; hence, the outcome is usually an average, percentage, or general quantity numerically summarising the population (or subsets of the population).

Definition 2.7 (Outcome) The outcome in a RQ is the result, output, consequence or effect of interest in a study, numerically summarising the population (or subsets of the population).

The outcome may be (for example):

• average increase in heart rates.
• average amount of wear after 1000 hours of use.
• proportion of people whose pupils dilate.
• average weight loss after three weeks.
• percentage of seedlings that die.

The outcome in a RQ summarises a population; it does not describe the individuals in the population.

### 2.3.3 The Comparison or Connection

In addition to having a population (P) and an outcome (O), some RQs may compare the outcome between a small number of different, distinct subsets of the population (that is, groups of individuals), or may explore a connection between the outcome and some other quantity that varies.

Definition 2.8 (Comparison) The comparison in the RQ identifies the small number of different, distinct subsets of the population between which the outcome is compared. The groups being compared have either imposed differences, or have existing differences.

The outcome may be compared between two or more separate subsets of the population:

• Average amount of wear in floor boards (O) could be compared across two groups in the population: standard wooden flooring materials and bamboo flooring.
• Average heart rates (O) could be compared across three subsets of the population: those who received no dose of a drug, those who received a daily dose of the drug, and those who received a dose of the drug twice daily.

Be careful!

This definition requires that the population can be separated into two (or more) subgroups, that have either imposed differences (for example, one group is given one dose of fertilizer per day, and another given two doses of fertilizer per day) or have existing differences (for example, one group of people aged under 30, and another group of people aged 30 or over).

If all individuals are treated in the same way, or do not have existing difference that allow them to be divided into group to be compared, there is no comparison according to this definition.

Example 2.10 (Comparison) Consider a study to compare the average blood pressure (the Outcome) in Australians (the Population), to see if the average blood pressure in the right arm is the same as the average blood pressure in the left arm.

There is no comparison: the Outcome (average blood pressure) is not compared in two different subsets of the population; every person is treated the same way.

Instead, the blood pressure is measured twice on every member of the population. The outcome might be best described as 'the mean difference between right- and left-arm blood pressure'.

In contrast, a study comparing the average blood pressure between (a) people aged under 40, and (b) people aged 40 or over does have a comparison: two subsets (under 40; 40 and over) of the population (Australians) are compared.

Definition 2.9 (Connection) The connection in the RQ identifies another quantity of interest that varies, that may be related to the outcome.

As the value of the connection changes, the value of the outcome (potentially) changes; for example:

• The connection between average heart rate (O) and exposure to various doses of caffeine (C) in mg.
• The connection between percentage germination (O) and hours of sunlight per day (C).

### 2.3.4 The Intervention

In addition to having a population (P), an outcome (O), and possibly a connection or comparison (C), some RQs also have an intervention.

Definition 2.10 (Intervention) An intervention is a comparison or connection whose value can be manipulated by the researchers.

That is, the researchers have imposed the intervention upon those in the study intending to change the outcome.

The intervention may be:

• explicitly giving a new drug to patients.
• explicitly applying wear testing loads to two different flooring materials.
• explicitly exposing people to different stimuli.
• explicitly applying a different dose of fertiliser.

Example 2.11 (Interventions) A study comparing the average blood pressure (O) in female and male (C) Australians (P) measured blood pressure using a blood pressure machine (a sphygmomanometer).

The research team needs to interact with the participants and use a machine to measure blood pressure, but there is no intervention. Using the sphygmomanometer is just a way to measure blood pressure, to obtain the data. The sphygmomanometer is not used with the intent of changing the outcome.

There is no intervention, since the comparison is between females and males, and this cannot be imposed on the individuals by the researchers.

Sometimes, it is not clear from the RQ if an intervention is present or not. If you are writing an interventional RQ, you should try to make it clear when an intervention is used.

A study of American college women aimed to:

...assess iron status [...] in highly active (>12 hr purposeful physical activity per week) and sedentary (<2 hr purposeful physical activity per week) women...

--- p. 521.

In this study, what is the:

• Outcome?
• Comparison or Connection (if any)?
• Intervention (if any)?

## 2.4 Types of RQs

All RQs have a population (P) and an outcome (O). However, different types of RQ emerge depending on whether the RQ also has a comparison/connection (C) or intervention (I).

This section studies different types of research questions:

These are compared in Sect. 2.4.4.

### 2.4.1 Descriptive RQs (PO)

Descriptive RQs are the most basic RQs, and identify:

• The Population to be studied.

Typically, descriptive RQs look like this:

Among {the population}, what is {the outcome}?

This is not a 'recipe', but a guideline.

Example 2.12 (Descriptive RQ) Consider this RQ:

Among Australian males between 18 and 35 years of age, what is the average heart rate?

In this RQ, the Population is 'Australians males between 18 and 35 years of age', and the Outcome is 'Average heart rate'. Notice that the Outcome is a numerical summary of the Outcome across the population (the average heart rate).

This is a descriptive RQ, as the RQ does not imply studying a connection with, or comparison between, the average heart rate and anything else.

Consider this RQ:

Among Australian adults, what proportion are coeliacs?

For this RQ, identify the Population and the Outcome.

### 2.4.2 Relational RQs (POC)

Usually, relationships are more interesting than just descriptions; relational RQs explore existing relationships. Relational RQs identify:

• The Population.
• The Outcome.
• The Comparison or Connection.

Relational RQs have no intervention; the connection or comparison is not imposed by the researchers.

Typically, relational RQs based on a comparison look like this:

Among {the population}, is {the outcome} the same for {the groups being compared}?

Example 2.13 (Relational RQ) Consider this RQ:

Among Australians between 18 and 35 years of age, is the average heart rate the same for females and males?

In this RQ, the Population is 'Australians between 18 and 35 years of age', the Outcome is 'average heart rate', and the Comparison is 'between females and males'.

This is a relational RQ based on a comparison. Notice that the average heart rate (Outcome) is a numerical summary across the two population sub-groups being compared (females; males).

The sex of the individual (the C) is not allocated by the researchers, so there is no intervention.

Typically, relational RQs based on a connection look like this:

Among {the population}, is {the outcome} related to {something else}?

Example 2.14 (Relational RQ) Consider this RQ:

Among Australians between 18 and 35 years of age, is the average heart rate related to age?

In this RQ, the Population is 'Australians between 18 and 35 years of age', the Outcome is 'average heart rate', and the Connection is with 'age'.

This is a relational RQ based on a connection. Age (the C) is not allocated by the researchers, so there is no intervention.

Consider this RQ (based on ):

In the Queensland Ambulance Service last year, what was the difference between the average response time to emergency calls between weekdays and weekends?

Identify the Population, the Outcome, and the Comparison.

Consider this RQ (based on ):

In Queensland state forests, is there a relationship between the average number of noisy miners and the number of eucalypts, in general?

(A noisy miner is a type of bird.) In this RQ, identify the Population. the Outcome, and the Connection.

Example 2.15 (Descriptive and relational RQs) Consider a study of blood pressure in Australians (the Population), comparing right- and left-arm blood pressures.

This is a descriptive RQ. There is no comparison, since there are not two subsets of the population being compared.

The blood pressure is measured twice on each member of the population: every member of the population is treated in the same way. The outcome is 'the average difference between right- and left-arm blood pressure'. This is a descriptive RQ.

In contrast, a study comparing the average blood pressure between females and males is a relational RQ. There is a comparison: the two subsets of the population (Australians) being compared are females and males.

### 2.4.3 Interventional RQs (POCI)

Interventional RQs explore relationships where the comparison/connection is determined or allocated by the researchers. They identify:

• The Population.
• The Outcome.
• The Comparison or Connection.
• The Intervention.

Interventional RQs may look like relational RQs, except that the comparison or connection is determined or allocated (i.e., imposed) by the researchers.

Sometimes it is not clear if the comparison or connection has been imposed by the researchers in an interventional RQ. When writing interventional RQs, make efforts to make it clear, if possible, when the RQ is interventional.

Example 2.16 (Interventional RQ) Consider this RQ:

Among Australians between 18 and 35 years of age, is the average heart rate for people allocated to receive a new pill the same as for people allocated to receive an existing pill?

In this RQ, the Population is 'Australians between 18 and 35 years of age', the Outcome is 'average heart rate', and the Comparison is 'between those taking the new pill, and those taking the existing pill'.

There is an Intervention: the researchers allocate one of the pills to each subject. This is an interventional RQ.

### 2.4.4 Comparing the three levels of RQs

Descriptive RQs are the most basic and are usually used when a research topic is in its infancy; descriptive RQs set the platform for asking relational questions.

Relational RQs explore relationships, and provide an understanding of how the outcome of interest is related to certain sub-groups of the population; they may set the platform for asking interventional questions.

Interventional RQs (when possible to answer) are the most interesting: they can be used to test theories or models, or to establish cause-and-effect relationships (Table 2.1).

TABLE 2.1: The three types of RQs
RQ type P O C I
Descriptive (D) Yes Yes
Relational (R) Yes Yes Yes
Interventional (I) Yes Yes Yes Yes

Research often develops through these stages of RQs as knowledge grows and develops. For example:

• Descriptive: What proportion of Australian adults are coeliacs?64
• Relational: Among Australian adults, is the proportion of females who are coeliacs the same as the proportion of males who are coeliacs?65
• Interventional: Among Australian adult coeliacs, is the percentage with adverse symptoms the same for those given a diet without oats and those given a diet with oats??66

What type of RQs are the following: Descriptive, Relational, or Interventional?

1. Among Australian upper-limb amputees, is the percentage wearing prosthesis 'all the time' the same for transradial and transhumeral amputations?67
2. In New York, what is the difference between the average height of oaks trees ten weeks after planting, comparing trees planted in a concrete sidewalk and a grassed sidewalk?68
3. What is the average response time of paramedics to emergency calls?69
4. Is there a relationship between the average weekly hours of physical activity in children and the weekly maximum temperature?70

## 2.5 Two approaches to RQs

RQs can be approached in one of two ways:

• For estimation (confidence intervals): These RQs are concerned with, for example, estimating a value in a population. This value may be the size of a difference (probably a RQ with a Comparison), or strength of a relationship (probably a RQ with a Connection).
• For making decisions (hypothesis testing): These RQs are concerned with making a decision about an unknown population value: for example, is the percentage the same in two different groups of the population?

What approach do these RQs take: Decision-making, or Estimation?

1. Among Australian upper-limb amputees, is the percentage wearing prosthesis 'all the time' the same for transradial and transhumeral amputations?72
2. In New York, what is the difference between the average height of oaks trees (ten weeks after planting) comparing trees planted in a concrete sidewalk and a grassed sidewalk?73
3. What is the average response time of paramedics to emergency calls?74
4. Is there a relationship between the average weekly hours of physical activity in children and the weekly maximum temperature?75

### 2.5.1 Estimation RQs

Sometimes, the RQ concerns how precisely a value in the population is estimated by the sample. This value may measure a difference, or the strength of a relationship.

These RQs are studied in Chapters 19 to 25, and in Sect. 36.7.

Example 2.17 (Estimation RQs) Consider this RQ (based on ):

Among Australian teens with a common cold, how much shorter are cold symptoms, on average, for teens taking a daily dose of echinacea compared to teens taking no medication?

This RQ asks about size of the difference (in the population) between the average duration of cold symptoms.

Only sample data are available, and there may be no difference (on average) at all in the population.

### 2.5.2 Decision-making RQs

Sometimes, RQs are not about the precision with which a population value is estimated by the sample, but instead about deciding if a difference or a relationship exists in the population.

These RQs often are associated with hypotheses: statements that suggest possible answers to the RQ. Based on the sample, the hypothesis best supported by the data is to be chosen.

These RQs are studied in Chapters 27 to 32, and in Sect. 36.6.

Example 2.18 (Making decisions with samples) Consider this RQ (based on ):

Among Australian teens with a common cold, is the average duration of cold symptoms shorter for teens taking a daily dose of echinacea compared to teens taking no medication?

This is a decision-type RQ, with two possible answers (Fig. 2.3): Either echinacea does result in shorter average cold durations, or it doesn't. In practice the answer is rarely clear cut, and instead how much evidence there is in the sample to support a particular hypothesis about the population is reported.

Evidence may support or contradict a hypothesis; evidence rarely proves a hypothesis (at least, without any other support, such as theortical support). Ultimately, after collecting data from a sample, a decision must be made about which explanation about the population is more consistent with the data collected.

## 2.6 Writing RQs

Ideally, a well-written RQ79 should be:

• Feasible: Answering the RQ should be possible practically; sufficient personnel, time, resources, and money should be available to complete the study properly.
• Interesting: The RQ should be interesting. For example, no-one cares about comparing the percentage of people who prefer drinking tea in blue cups to green cups...
• Novel: The RQ should be original (the RQ should 'seek to confirm, refute or extend previous findings, and potentially reveal new findings' ( p. 410). Researching something already well known is waste of time and resources.
• Ethical: The RQ must be able to be answered ethically (Chap. 4). This is not negotiable.
• Relevant: The RQ should be relevant and current.

Note the acronym FINER to help remember these guidelines.

In most undergraduate university courses, a Project RQ must be feasible and ethical.

Given the nature of a course, and the short timelines, these RQs don't necessarily need to be Interesting, Novel or Relevant. It is great if it is all of these, however.

Example 2.19 (Poor RQ) Here is a RQ submitted by a student group (including typos) at the university where I work:

Utilising a convenience sample at The University of Sunshine Coast in Sippy Downs, is there a difference in taste perception between students on a Thursday morning and afternoon (, when comparing English and Australian Cadburys milk chocolate ?

This is a poor RQ:

• General poor writing: A round bracket is started but never closed, for example... and for some reason the bracket is followed by a comma. This shows poor attention to detail.
• The RQ starts by describing the sample ('a convenience sample'), but RQs are always about a population, not a sample.
• The RQ does not have a clear Outcome that numerically summarises the population: a proportion or a mean, for instance.
• It is not clear whether the comparison is between morning and afternoons, or between English and Australian chocolates, or both.

Notice that 'taste perception' is not defined. This is not a criticism: the operational definitions can be provided elsewhere.

This is a far better RQ:

For USC students at the Sippy Downs campus, is the percentage of people who can correctly identify English or Australians chocolates the same in the mornings as in the afternoons?

Written this way:

• P: USC students at Sippy Downs campus.
• O: The percentage correctly identifing English or Australian chocolates.
• C: Between mornings and afternoons.
• I: No intervention: We cannot decide if a particular time of day is morning or afternoon.

This RQ is Feasible and Ethical, but probably not really Interesting (except to the project group), Novel or Relevant... but that's OK.

Exclusion criteria might exclude people with dairy intolerance, and those who do not eat dairy (such as vegans).

## 2.7 Writing RQs: An example

RQs emerge from observations, which leads to asking questions, and the need for evidence to answer that question.81

For example, suppose you notice that many people take echinacea when they get a cold; it is reasonable to ask if there is evidence that echinacea helps with a cold in any way. This may lead to an initial RQ (based on ):

Is it better to take echinacea when you have a cold?

This RQ is clearly poor, but serves as a starting point.

RQs often start as a basic idea, which can be refined by clarifying the POCI elements. For example, what population could we study? Many options exist:

• 'You' is implied by the question... but this is not a useful or practical population.
• All Australians.
• Australians adults with a specific "cold".

What outcome could be used to to determine echinacea's effectiveness? Again, many options exist:

• Average cold duration.
• Average severity of cold symptoms.
• Percentage of people who take days off work.

The initial RQ cannot be answered because 'better' is ambiguous: better than what? We could decide to compare an outcome across different groups, or connect it to something else. For example, the comparison could be:

• Between taking echinacea and taking no medication.
• Between taking echinacea and taking another medication.
• Between taking different doses of echinacea.

Furthermore, we could decide to intervene or not. Whether we decide to include an intervention or not has implications for how the study is conducted and how the results are interpreted.

If we decided not to intervene, the subjects in the study would decide for themselves how to treat their cold. If we did decide to intervene, various interventions could be used:

• Imposing how frequently the dose was taken; and/or
• Imposing what doses of echinacea to take.

After making some decisions about P, O, C and I, consider this revised RQ:

Among Australian teenagers with a common cold, is the duration of cold symptoms shorter for teens taking a daily dose of echinacea compared to teens taking no medication?

The P, O, C and I do not have to be comprehensively described in the RQ; some information could be provided later as operational definitions (e.g., dose).

This RQ is much better, but it is still not correct. The outcome is a numerical summary across subsets of the population, not of individuals. So consider this revised RQ (based on ):

Among Australian teenagers with a common cold, is the average duration of cold symptoms shorter for teens given a daily dose of echinacea compared to teenagers given no medication?

This is a better RQ.

For this RQ above, identify the Population, Outcome, Comparison or Connection, and the Intervention (if any).

The following short video may help explain some of these concepts:

## 2.8 Variables: From populations to individuals

RQs explore relationships in the population. The Outcome describes the population in general, and so Outcomes are often worded in terms of averages or percentages or similar. For example, consider this RQ seen above:

Among Australian teenagers with a common cold, is the average duration of cold symptoms shorter for teens given a daily dose of echinacea compared to teenagers given no medication?

This is an interventional RQ (using a comparison) about a population.

No relationship could be found with information from just two teenagers. Consider this: suppose a cold lasts for 6 days for a teenager who does take echinacea, and a cold lasts for 5 days for a teenager who does not take echinacea. Is there a difference between the cold durations in the population? We have no way of knowing: Only two teenagers were studied. To explore the relationship using teenagers in general, data from many teenagers is needed.

RQs concern numerical summaries about populations, but the data to answer the RQ come from individuals in the population. (As with the word 'population', the word 'individual' does not only refer to people.)

Each piece of information that we gather from individuals is called a variable, because its values can vary from individual to individual.

Definition 2.11 (Variable) A variable is a single aspect or characteristic associated with each of a group of individuals under consideration, whose values can vary from individual to individual.

The value of a variable can vary from one individual to the next. Examples include

• the duration of cold symptoms;
• gender;
• age;
• place of birth;
• amount of tyre wear;
• hair colour.

The RQ identifies the variables needed to answer the RQ, though other variables may be (and typically are) measured also (Sect. 6.4).

A variable is a single aspect that can vary from individual to individual.

Your city of birth may not change, but 'city of birth' is still a variable because it can vary from individual to individual. Your city of birth may not be changing, but that is not relevant.

Example 2.20 (Variables) 'Duration of cold symptoms' is a variable, as it is obtained from individuals, and its value can vary from individual to individual.

The 'average duration of cold symptoms' is the outcome, numerically summarising the individuals' cold durations across the population.

While many variables can be measured on individuals, two variables are of greatest importance:

• The response variable measures, assesses, describes or records information to determine the outcome; and
• The explanatory variable measures, assesses, describes or records information to determine the comparison or connection (Table 2.2).

 Population Individuals Outcome: $$\rightarrow$$ Response variable Comparison/Connection: $$\rightarrow$$ Explanatory variable

Definition 2.12 (Response variable) A response variable is the variable used to measure, assess or describe the outcome on each individual in the population.

The outcome refers to the numerical summary of the values of the response variable (Table 2.3).

 $$\rightarrow$$ Population $$\rightarrow$$ Individuals Average increase in diastolic blood pressure $$\rightarrow$$ Increase in diastolic blood pressure of individuals() before and after exercise Percentage of seedings that sprout $$\rightarrow$$ Whether or not an individual seedling sprouts Proportion owning iPad $$\rightarrow$$ Whether or not an individual owns an iPad Average cold duration $$\rightarrow$$ Cold duration for individuals Percentage of concrete cylinders having fissures $$\rightarrow$$ Whether or not an individual cylinder has fissures

Definition 2.13 (Explanatory variable) An explanatory variable is a variable of interest from the individuals in the study which (potentially) causes changes in, or is related to, the response variable.

The explanatory variable is a formal description of what C measures, observes, assesses or describes in each individual member of the population (Table 2.4).

TABLE 2.4: Examples of the Comparison and the corresponding Explanatory variable
Comparison being made Explanatory variable in Individuals
Between males and females $$\rightarrow$$ The sex of each person
Between beech, tallowwood, and jarrah floor boards $$\rightarrow$$ Type of floorboard in each home
Between 350kg/ha and 400kg/ha fertilizer rates $$\rightarrow$$ Application rate in each paddock
Between people in their 20s, 30s and 40s $$\rightarrow$$ Age group for each person

In many cases, explanatory variable occurs before the response variable, or can be thought of as 'causing' the response variable.

Example 2.21 (Variables) For the final RQ for the echinacea study (Sect. 2.7), the response variable would be the length of cold symptoms, and the explanatory variable is the type of medication (echinacea; or none).

In this case, the type of medication is taken before the cold symptoms disappear, and perhaps even causing them to disappear.

Consider this RQ:

For carrots grown in Buderim, is the average weight of carrots 8 weeks after planting the same when grown without Thrive, and for carrots grown with weekly applications of Thrive?

1. What is the outcome? What is the comparison?
2. What data is needed from each element of the population to answer this question? That is, what are the response and explanatory variables?

Example 2.22 (Variables) Consider this RQ:

For overweight men over 60, is the average weight loss after three weeks the same for a diet high in fresh fruit and a diet high in dried fruit?

The outcome is the average weight loss; the response variable is the weight loss for each individual man. (This would be found by measuring their weight before and after three weeks on the diets.)

The comparison is between the two diets; the explanatory variable is which diet each man is on.

## 2.9 Units of observation and analysis

Units of observation and units of analysis are important, but similar, concepts that need to be distinguished.

Consider this RQ:

In Australian 20-something men, is the average thickness of head hair strands the same for blond men and brunet men?86

What is the problem with comparing 100 hair strands from one blond man, to 100 hair strands from one brunet man?

In this study, only one man of each hair colour is represented. There are 200 observations, but only two people are compared, so little is learnt about 20-something men in general.

We learn a lot about two men specifically. The Population is represented by just two men... so we don't learn much about the population of men in general.

In this study, each individual hair is a unit of observation: the hair strands are what must be measured to obtain 'thickness of head hair strands'.

But each blond hair comes from the same man, so each of those hairs have essentially lived their life together: They are washed at the same time, with the same shampoo, exposed to the same amount of sunlight and exercise, share genetics, etc. However, different people do their own thing and have their own genetics.

Definition 2.14 (Unit of observation) Unit of observation: The 'who' or 'what' which are observed, from which measurements are taken and data collected.

Notice that the RQ is comparing blond men with brunet men. That is, men are being compared. Each man is a unit of analysis.

This leads to a similar, but different, concept: the unit of analysis.

Definition 2.15 (Unit of analysis) Unit of analysis: The 'who' or 'what' about which generalizations and conclusions are made; the smallest independent 'who' or 'what' for which information is analysed. Units of analysis should not typically share a common underlying source.

In the hair-thickness study each person is a unit of analysis. Importantly, the size of sample in the study is the number units of analysis; so here, there are only two examples of the population in the study. The size of the sample is just two.

The size of the sample in a study is the number of units of analysis.

All studies have units of analysis, and units of observation.

Example 2.23 (Units of analysis) In the hair-strand study, each hair strand is a unit of observation: measurements of hair strand thickness are taken from individual hair strands.

However, the unit of analysis is the person: the hair strands from each man share a lot in common. The men themselves would share little in common, and we are interested in comparing men.

Example 2.24 (Units of analysis) Consider a study comparing the percentage of females and males wearing sunglasses at a specific beach.

People in a group at the beach will probably not be operating 'independently': groups of people tend to behave similarly (but perhaps not identically). For example, a couple will often both be either wearing or not wearing sunglasses.

The researchers have two options; they could either

• Use the people groups as the unit of analysis (some of which will be groups of one), and record data from just one person in any group.

Ideally, the researchers would specify before-hand which group member from which to take data (e.g., the person closest to the researchers when the group is spotted).

• Alternatively, the researchers may decide not to use data from groups at all, and only gather data from individuals.

Example 2.25 (Units of analysis) A study compares two brands of car tyres. Four tyres of Brand A are allocated to each of Cars 1--5. Four tyres of Brand B are allocated to each of Cars 6--10.

After 12 months, the amount of wear is recorded on each tyre. The unit of observation is the tyre: the amount of wear is measured on each tyre.

The unit of analysis is the car: the brand of tyre is allocated to the car and all wheels on the car get the same tyre. Tyres on any one car 'live their life together': They all are exposed to the same day-to-day use, the same drivers, have driven very similar distances, under the same conditions, etc.

The units of observation and units of analysis may be the same, and often are the same. However, they are sometimes different too, and it is crucial to be able to identify these situations. Importantly, studies compare units of analysis, not units of observation.

Example 2.26 (Units of analysis) A study compared two school physical activity (PA) programs. Each of 44 children (whose parents agreed for their children to participate in the study) were allocated to one of two PA programs. The improvement in children's fitness was measured for every student in the study after six months.

The units of observation are the individual students, as the the fitness measurements are taken from the students. The units of analysis are also the individual students, as the PA program was allocated to each student individually.

A study compared two school physical activity (PA) programs. Program 1 was allocated to be used at School A, while Program 2 was allocated to School B. In each school, 22 children (with parental consent) were observed and the improvement in children's fitness was measured for each student after six months.

What are the units of analysis and unit of observation?

The following short video may help explain some of these concepts:

## 2.10 Preparing software

Most statistical software (including jamovi88 and SPSS)89 uses the same approach for collating the data90:

• Each row represents one unit of analysis. Hence, the number of rows will equal the number of units of analysis.
• Each column represents one variable. Hence, the number of columns will equal the number of variables. (There may also be a column of identifying information (such as the person's name).)

In statistical software, the names of the variables are not placed in a separate row (say, in Row 1 above the data itself), which might happen when using a spreadsheet.

The names of the variables become the names of the columns.

Example 2.27 (Preparing statistical software) In Sect. 2.8, this RQ was posed:

Among Australian teenagers with a common cold, is the average duration of cold symptoms shorter for teens given a daily dose of echinacea compared to teenagers given no medication?

For this RQ, the variables are (Examples 2.20 and 2.21):

• 'Duration of cold symptoms' (response), and
• 'Type of treatment' (explanatory).

To set up the software for data entry:

• The number of rows of data would be the number of people in the study.

• The number of columns would be two: one column to record the duration of each individual's cold symptoms, and the other to record whether the individual received a dose of echinacea or received no medication.

In addition, there may be a column recording the name or ID of each individual.

The variable names (say, Duration and Treatment) would not be in a row of their own; they would be the columns names (Fig 2.5).

While spreadsheets (such as Excel) can be used for analysing data, significant problems can, and do, emerge with using spreadsheets. Great care is needed when using spreadsheets for data analysis!

## 2.11 Summary

In this chapter, we have learnt about writing and understanding research questions. Research questions (RQs) are always about an outcome (O) in some population. Some RQs have a comparison or connection (C), and some also have an intervention (I). RQs may be estimation-type RQs or decision-type RQs.

The outcome numerically summarises the population or subsets of the population (so is usually worded in terms of percentages, averages, etc.), but the data comes from individuals in the population by measuring, observing or assessing the response (or dependent) variable. Similarly, the data concerning the comparison or connection comes from measuring or observing the explanatory (or independent) variables.

The who or what that observations are made from are called the units of observation. The smallest independent units (that is, units with very little in common) are called the units of analysis.

The following short video may help explain some of these concepts:

## 2.12 Quick review questions

Consider this RQ:

Is the average walking speed the same when texting and talking on a mobile phone?

1. What is the explanatory variable?
2. What is the response variable?
3. What is the outcome?

Progress:

1. What individual people are doing with their phones probably explains their walking speed: the explanatory variable is the way in which the mobile phone is being used.
Notice that 'talking on the phone' and 'texting on the phone' are not variables. They are particular values that the variable can take. That is, 'what people are doing on their phone' is the variable, because it can vary: Sometimes people will be talking, sometimes texting, etc.

2. Waking speed probably depends on (or responds to) how individual people are using their phone: the response variable is the walking speed.

3. The outcome is how the response variable is summarised over a group of individuals. The waling speeds from many individuals could be summarised numerically using the average walking speed, which would be the outcome.

## 2.13 Exercises

Selected answers are available in Sect. D.2.

Exercise 2.1 In a study of public acceptance of alternative water supplies,91 various water sources are defined. In Table 2.5, match the term with the appropriate operational definition.

TABLE 2.5: Match the term with the operational definition
Term Definition
Rainwater Rainwater from a rainwater collection tank on your property
Bottled Water you presently use throughout your dwelling (home)
Tap Highly purified seawater deemed by scientists and public health officials as safe for human consumption.
Recycled Highly purified wastewater deemed by scientists as safe for human consumption
Desalinated Water sold in bottles by food companies that is widely available to the public for purchase and consumption

Exercise 2.2 Consider this RQ:

Among university students, is the average resting diastolic blood pressure the same for students who regularly drive to university and those who regularly ride their bicycles?

1. For this RQ, identify the Population.
2. For this RQ, identify the Outcome.
3. For this RQ, identify the Connection, if any.
4. For this RQ, identify the Intervention, if any.
5. What type of RQ is this?
6. What operational definitions would be needed?
7. What information must be collected from each individual to answer the RQ?
8. What are the units of analysis?
9. What are the units of observation?

We conducted a 4-year (1995--1998) field study in a Peruvian peri-urban community... to examine the relation between diarrohea and nutritional status in 230 children $$<3$$ years of age

--- p. 210

For this study:

1. Identify PICO.
2. Infer the primary research question.
3. What type of question is used?
4. What operational definitions would be needed?
5. What are the response and explanatory variables?

Exercise 2.4 For the following response variables, what would be the corresponding outcomes?

1. Whether a vehicle crashes or not.
2. The height at which people can jump.
3. The number of tomatoes per plant.
4. Whether or not a person owns a car.

Exercise 2.5 For the following comparisons, what would be the corresponding explanatory variables?

1. Between 91 octane, 95 octane, and ethanol-blended car fuel.
2. Between caffeinated and decaffeinated coffee.
3. Between taking zero, one or two iron tablet per day.
4. Between vegans and vegetarians.

Exercise 2.6 For the following studies, determine which have a Comparison and which do not. In each case, identify the Outcome.

1. A study to determine if a higher percentage of people at a particular city park wear hats in winter compared to summer.
2. A study to determine if average cholesterol levels are the same when measured on the same people before and after a diet change.
3. A study to determine if the average balance-time on right legs is the same as on left legs.
4. A study to determine if the average yield of tomato plants is the same when three different fertilisers are applied.

Exercise 2.7 Animals in an experiment are divided into pens (three per pen), and feed is allocated to each pen.94 Animals in different pens receive different feed; animals in the same pen receive the same feed. The weight gain of each animal is recorded.

1. What is the unit of observation? Why?
2. What is the unit of analysis? Why?

Exercise 2.8 Consider this actual Project Report RQ from the university where I work. Critique the RQ, and write a better RQ (if necessary).

Among 10 Australian adults, does the time taken to read a passage of text change when different fonts are used?

Exercise 2.9 Consider this actual Project Report RQ from the university where I work. Critique the RQ, and write a better RQ (if necessary).

Of students that study at USC, Sippy Downs, do males have a larger lung capacity than females?