# 11 Describing data

So far, you have learnt to ask a RQ, identify different ways of obtaining data, design the study and collect the data.

**In this chapter**,
you will learn how to describe the data,
because this determines how to proceed with the analysis.
You will learn to:

- identify qualitative and quantitative variables.
- identify nominal and ordinal qualitative variables.
- identify continuous and discrete quantitative variables.
- describe data in ways suitable for use in software.

## 11.1 Quantitative and qualitative data

Understanding the *type* of data collected is
essential before starting any analysis,
because the *type* of data determines how to proceed with summaries and analyses.

Broadly, data may be described as either:

We can also talk about quantitative and qualitative *variables*.
(Remember that variables are measured on the *individuals* in the study.)
The *variable* is the *description* of what varies,
and the *data* are the *values* of the variables that are recorded.
Quantitative variables record quantitative data,
and
qualitative variables record qualitative data.

*Quantitative research* summarises and analyses data
using numerical methods
(Sect. 1.7).

*Quantitative research*
can include both *quantitative* and *qualitative* variables,
because both *quantitative* and *qualitative*
data can be summarised numerically
(Chaps. 13 and 14 respectively)
and analysed numerically.

**Example 11.1 (Variables and data) **'Age' is a *variable* because age can vary from individual to individual.
The *data* would be values such as 13 years, 21 years and 76 years.

### 11.1.1 Quantitative data: Discrete and continuous data

**Quantitative** data
are *mathematically* numerical.
Most data that are counted or measured will be quantitative.
Quantitative data is often (but not always)
measured with measurement units (such as *kg* or *cm*).

**Definition 11.1 (Quantitative data) ***Quantitative data* is *mathematically* numerical data:
the numbers themselves have numerical meaning,
and it makes sense to be able to perform mathematical operations on them.
Most data that are counted or measured will be quantitative.

Be careful:
Just because the data are numbers,
it does not necessarily mean that the data are quantitative.
*Mathematically numerical data are quantitative*;
that is,
numbers with numerical *meanings*.

**Example 11.2 (Quantitative data) **Australian postcodes are numbers,
but are *not* quantitative.
The numbers are just labels.
A postcode of 4556 isn't one 'better' or one 'more' than a postcode of 4555.

The values do not have numerical meanings. Indeed, rather than numerical postcodes, alphabetic postcodes could have been chosen. For example, the post code of Caboolture is 4510, but it could have been QCAB for instance.

Quantitative data may be further defined as *discrete* or *continuous*.

*Discrete* quantitative data has possible values that can be counted,
at least in theory.
Sometimes,
the possible values may not have a theoretical upper limit,
yet can be still considered 'countable'.

**Definition 11.2 (Discrete data) ***Discrete* quantitative data has a countable number of possible values
between any two given values of the variable.

**Example 11.3 (Discrete quantitative data) **These (quantitative) variables are *discrete*
(and so record *discrete* quantitative data):

- The
*number*of heart attacks in the previous year experienced by women over 40. Possible values are 0, 1, 2, ... - The
*number*of cracked eggs in a carton of 12. Possible values are: 0, 1, 2, ... 12. - The
*number*of orthotic devices a person has ever used. Possible values are 0, 1, 2, ... - The
*number*of fissures in turbines after 5000 hours of use. Possible values are 0, 1, 2, ...

*Continuous* quantitative data has values that cannot,
at least in theory,
be recorded exactly.
In other words,
another value can always be found between
any two given values of the variable,
if we measure to a greater number of decimal places.
In practice, though,
the values need to be rounded to a reasonable number of decimal places.

**Definition 11.3 (Continuous data) ***Continuous* quantitative data have (at least in theory)
an infinite number of possible values
between any two given values.

Height is continuous: between the heights of 179cm and 180cm, many heights exist, depending on how many decimal places are used to record height. In practice, however, heights are usually rounded to the nearest centimetre for convenience. All continuous data are rounded.

**Example 11.4 (Continuous quantitative data) **These (quantitative) variables are *continuous*
(that is, they record continuous quantitative data):

- The
*weight*of 6-year-old Australian children. Values exist between any two given values of weight, by measuring to more decimal places of a kilogram; we would usually quote weight to the nearest kilogram - The
*energy consumption*of houses in a given city. Values exist between any two given values of energy consumption, by measuring to more and more decimal places of a kiloWatt-hour (kWh); we would usually quote to the nearest kWh. - The
*time*spent in front of a computer each day for employees in a given industry. Values exist between any two given times, by measuring to more decimal places of a second; we would usually quote the times to (say) the nearest minute, or the nearest 15 minutes.

### 11.1.2 Qualitative data: Nominal and ordinal data

**Qualitative** data
has distinct labels or categories that are not mathematically numerical.
These categories are called the *levels*
or the *values*
of the variable.

**Definition 11.4 (Qualitative data) ***Qualitative data* is not *mathematically* numerical data:
it consists of categories or labels.

**Definition 11.5 (Levels) **The *levels* (or the *values*) of a qualitative variable
refer to the names of the distinct categories.

**Example 11.5 (Qualitative data) **'Brand of mobile phone' is a qualitative variable.
Many levels are possible (that is, many possible brands),
but these could be simplified by defining the levels as
'Huawei', 'Apple', 'Samsung', 'Google' and 'Other'.

Be careful:
*numerical* data
may be qualitative.
Qualitative data are not *mathematically* numerical;
that is,
the numbers don't have numerical *meanings*.

**Example 11.6 (Qualitative data) **Australian postcodes are numbers,
but are *qualitative*
(Example 11.2).

Here are two survey questions that produce qualitative data.

- What is your blood type?

- Type A.
- Type B.
- Type AB.
- Type O.

- What is your age group?

- Under 20.
- 20 to under 30.
- 30 to under 50.
- 50 or over.

What features of the data collected from the questions are similar? What features are different?

Qualitative data can be further classified as
*nominal* or *ordinal*.
*Nominal* variables are qualitative variables where
the levels have *no natural order*.
*Ordinal* variables are qualitative variables where
the levels do have a *natural order*.

So in the example above,
'Blood type' is qualitative *nominal*,
while 'Age group' is qualitative *ordinal*.

**Definition 11.6 (Nominal qualitative variables) **A *nominal* qualitative variable is a qualitative variable where the levels
*do not* have a natural order.

**Definition 11.7 (Ordinal qualitative variables) **An *ordinal* qualitative variable is a qualitative variable where the levels
*do* have a natural order.

**Example 11.7 (Nominal data) **This survey question will produce *nominal* data:

How do you usually get to university?

- Car (as driver or passenger).
- Bus.
- Ride bicycle or walk.
- Other.

The data will be nominal with four levels. The levels can appear in any order: from largest group to smallest, or in alphabetical order.

Since there is no *natural* order,
the order used should be carefully considered:
what is the most useful order
when summarising the data?

**Example 11.8 (Ordinal data) **This survey question will produce *ordinal* data:

Please indicate the extent to which you agree or disagree with this statement: 'Permeable pavements technology has the potential to revolutionise green building practices'.

- Strongly disagree.
- Disagree.
- Neither agree or disagree.
- Agree.
- Strongly agree.

The data will be ordinal with five levels. Treat the levels in the given order (or the reverse order) makes sense; it would not make sense, for example, to give the levels in alphabetical order.

**Example 11.9 (Clarity in definitions) **Consider the variable 'Age'.
Age is *continuous quantitative*,
since we age continuously
(on our birthday, we don't suddenly get one year greyer with one extra year's worth of wrinkles...).

Age is usually rounded to the number of completed years, for convenience. However, the age of young children may be given as '3 days' or '10 months', instead of the nearest year.

Sometimes *Age group* is used instead
(such as
Under 20; 20 to under 30; 30 to under 50; 50 or over).
'Age group' is *qualitative ordinal*.

Ensure your RQ is clear about which is used!

**Example 11.10 (Types of variables) **Consider a study to determine if the weight of 500g bags of pasta
really is at least 500g.
One approach is to record the weight of pasta in each bag
(a *quantitative* variable),
and compare the *average* weight
to the target weight of 500g.

Another approach is to record whether or not each bag of pasta
weighed at least 500g (bags are not underweight).
This would be a *qualitative* variable, with two *levels*
(underweight; not underweight).
We could then report the *percentage* of bags that are underweight.

## 11.2 Describing data in jamovi and SPSS

In practice, quantitative research requires the use of a computer for producing graphs and completing calculations. In this book, two statistical software packages are described for analysis of data:

(For reason to avoid Excel and other spreadsheets, read this information from earlier in this book.)

This section makes only brief notes about setting up data in these software packages; consult a comprehensive reference for more (and better) details. For both packages, however, declaring the variables correctly is very important (Table 11.1).

Practically all software,
including jamovi and SPSS
record data in a spreadsheet-like grid,
with the *variables in the columns*,
and the
*units of analysis in the rows*.

Type of variable | Further classification | In jamovi | In SPSS |
---|---|---|---|

Qualitative | Nominal | Nominal | Nominal |

Ordinal | Ordinal | Ordinal | |

Quantative | Discrete | Continuous, Integer | Scale |

Continuous | Continuous, Decimal | Scale |

### 11.2.1 Using jamovi

In jamovi,
nominal variables are called *Nominal*,
and
ordinal variables are called *Ordinal*
(Table 11.1).
In jamovi,
*continuous* quantitative variables are called *continuous decimal*,
and
*discrete* quantitative variables are called (confusingly) *continuous integer*.

To add this information to jamovi, double-click on the variable name at the top of the data worksheet (Fig. 11.1), which produces Fig. 11.2. This opens an area where the data can be described:

- Nominal qualitative variables are set as
`Nominal`

, and the levels described in the`Levels`

area to the right - Ordinal qualitative variables are set as
`Ordinal`

, and the levels described in the`Levels`

area to the right. - Quantitative
*continuous*variables are set as`Continuous`

with the*Data type*as`Decimal`

. - Quantitative
*discrete*variables are set as`Continuous`

with the*Data type*as`Integer`

.

When the information has been entered, clicking the up-arrow on the top right (Fig. 11.2) closes this window.

### 11.2.2 Using SPSS

In SPSS,
variables are described in the *Variable View* window
(*not* the *Data View* window).
Each variable is then described
in the *Measure* column
(Fig. 11.3):

- Nominal qualitative variables are called
`Nominal`

. - Ordinal qualitative variables are called
`Ordinal`

. - Quantitative variables are called
`Scale`

, regardless of whether they are discrete or continuous.

## 11.3 Summary

Data and variables can be described as either
**quantitative** (either **discrete** or **continuous**)
or
**qualitative** (either **nominal** or **ordinal**).
Variables should be correctly defined in jamovi and SPSS.

## 11.4 Quick revision questions

A study on the bruising of apples^{260}
aimed to determine the relationship between the recorded surface temperature of apple,
the depth of bruising.

The researchers purposefully hit apples with three
different **forces** (200, 700 and 1200 mJ) to inflict bruises.

This was repeated at three different **locations** of the apple (lower; middle; upper).

The researchers then recorded the **depth** of the bruising,
and recorded the **surface temperature** at each bruise location.

- How would the variable 'region of the apple' be best described?
- How would the variable 'depth of bruising'? be best described
- How would the variable 'temperature of the bruise location' be best described?
- The variable 'force of hit' could be considered as quantitative continuous variable.
However,
since only a small number of forces are used,
it could also be considered as qualitative ordinal.

If it was considered as*qualitative ordinal*, how many*levels*would the variable have?

**Progress:**

## 11.5 Exercises

Selected answers are available in Sect. D.11.

**Exercise 11.1 **A study of lime trees
(*Tilia cordata*)
recorded these variables for 385 lime trees in Russia:^{261}

- the foliage biomass, in kg;
- the tree diameter (in cm);
- the age of the tree (in years); and
- the origin of the tree (one of Coppice, Natural, or Planted).

Describe the variables in the study using the language of this chapter.

**Exercise 11.2 **Are these variables
quantitative (discrete or continuous, and with what units of measurement?), or
qualitative (nominal or ordinal, and with what levels?)?

- Systolic blood pressure.
- Program of enrolment.
- Academic grade (HD; DN; CR; PS; FL).
- Number of times a person visited the doctor last year.

**Exercise 11.3 **A study of body mass index and its relationship with use of social media^{262}
recorded these variables (among others) from a group of 1140 participants:

- Age (under 45; 45 to 64; 65 or over)
- Gender (male; female)
- Location (urban; rural)
- Social media use (none; low; high)
- BMI (body mass index; the body mass in kg, divided by the square of height in cm)
- Total sitting time, in minutes per day

For each variable,
determine the *type* of variable:
quantitative (discrete or continuous, and with what units of measurement?), or
qualitative (nominal or ordinal, and with what levels)?

**Exercise 11.4 **In a study of the influence of using ankle-foot orthoses in children with cerebral palsy,^{263}
the data in Table 11.2 describe the 15 subjects.
(GMFCS is used to describe the impact of cerebral palsy on their motor function,
where *lower* levels means *better* functionality:
the Gross Motor Function Classification System.)
Describe the variables in the study.

Gender | Age (years) | Height (cm) | Weight (kg) | GMFCS |
---|---|---|---|---|

M | 9 | 136 | 34.5 | 1 |

M | 7 | 106 | 16.2 | 2 |

M | 7 | 129 | 21.1 | 1 |

M | 12 | 152 | 40.4 | 1 |

M | 11 | 146 | 39.3 | 2 |

M | 5 | 113 | 18.1 | 1 |

M | 6 | 112 | 16.7 | 2 |

M | 8 | 112 | 19.1 | 1 |

M | 8 | 138 | 28.6 | 1 |

M | 6 | 116 | 19.3 | 1 |

F | 7 | 113 | 17.6 | 1 |

M | 11 | 141 | 34.9 | 1 |

M | 7 | 136 | 34.5 | 1 |

F | 9 | 128 | 21.9 | 1 |

F | 8 | 133 | 23.0 | 1 |

**Exercise 11.5 **A study of fertilizer use^{264}
recorded the soil nitrogen after applying different fertilizer doses.
These variables were recorded:

- the
*fertilizer dose*, in kilograms of nitrogen per hectare; - the
*soil nitrogen*, in kilograms of nitrogen per hectare; and - the
*fertilizer source*; one of 'inorganic' or 'organic'.

Describe the variables in the study.

**Exercise 11.6 **A study^{265}
recorded the response of kangaroos to drones
(one of 'Vigilance', 'No vigilance', 'Flee \(<10\)m', or 'Flee \(>10\)m')
and the altitude of the drone (30m, 60m, 100m or 120m).
The mob size and sex of the kangaroo was also recorded.
Describe the variables in the study.

**Exercise 11.7 **A study of people who died while taking selfies^{266}
recorded the location
(Table 11.3).
Which of the following are the *variables* in the table?
For each that is a variable,
describe the variable.

- The location.
- The number of people who died at each location.
- The percentage of people who died at each location.

Number | Percentage | |
---|---|---|

Nature, associated environments | 48 | 43.2 |

Train, railway, associated structures | 22 | 19.9 |

Buildings, associated structures | 17 | 15.3 |

Road, bridge, associated structures | 12 | 10.8 |

Dams, associated structures | 7 | 6.3 |

Fields, farms, associated structures | 4 | 3.6 |

Others | 1 | 0.9 |