## 11.1 Quantitative and qualitative data

Understanding the *type* of data collected is
essential before starting any analysis,
because the *type* of data determines how to proceed with summaries and analyses.
Broadly,
data may be described as either:

We can also talk about quantitative and qualitative *variables*.
(Remember that variables are measured on the *individuals* in the study.)
The *variable* is the description of what varies,
and the *data* are the values of the variables that are recorded.
Quantitative variables record quantitative data,
and
qualitative variables record qualitative data.

*Quantitative research* summarises and analyses data
using numerical methods
(Sect. 1.7).

*Quantitative research*can include both

*quantitative*and

*qualitative*variables, because both

*quantitative*and

*qualitative*data can be summarised numerically (Chaps. 13 and 14 respectively) and analysed numerically.

**Example 11.1 (Variables and data)**‘Age’ is a

*variable*because age can vary from individual to individual. The

*data*would be values such as 13 years, 21 years and 76 years.

### 11.1.1 Quantitative data: Discrete and continuous data

**Quantitative** data
are *mathematically* numerical.
Most data that are counted or measured will be quantitative.
Quantitative data is often (but not always)
measured with measurement units (such as *kg* or *cm*).

**Definition 11.1 (Quantitative data)**

*Quantitative data*is

*mathematically*numerical data: the numbers themselves have numerical meaning, and it makes sense to be able to perform mathematical operations on them. Most data that are counted or measured will be quantitative.

Be careful:
Just because the data are numbers,
it does not necessarily mean that the data are quantitative.
*Mathematically numerical data are quantitative*;
that is,
numbers with numerical *meanings*.

**Example 11.2 (Quantitative data) **Australian postcodes are numbers,
but are *not* quantitative.
The numbers are just labels.
A postcode of 4556 isn’t one ‘better’ or one ‘more’ than a postcode of 4555.

Quantitative data may be further defined as *discrete* or *continuous*.
*Discrete* quantitative data has possible values that can be counted,
at least in theory.
Sometimes,
the possible values may not have a theoretical upper limit,
yet can be still considered ‘countable.’

**Definition 11.2 (Discrete data)**

*Discrete*quantitative data has a countable number of possible values between any two given values of the variable.

**Example 11.3 (Discrete quantitative data) **These (quantitative) variables are *discrete*
(and so record *discrete* quantitative data):

- The
*number*of heart attacks in the previous year experienced by women over 40. Possible values are 0, 1, 2, … - The
*number*of cracked eggs in a carton of 12. Possible values are: 0, 1, 2, … 12. - The
*number*of orthotic devices a person has ever used. Possible values are 0, 1, 2, … - The
*number*of fissures in turbines after 5000 hours of use. Possible values are 0, 1, 2, …

*Continuous* quantitative data has values that cannot,
at least in theory,
be recorded exactly.
In other words,
another value can always be found between
any two given values of the variable,
if we measure to a greater number of decimal places.
In practice, though,
the values need to be rounded to a reasonable number of decimal places.

**Definition 11.3 (Continuous data)**

*Continuous*quantitative data have (at least in theory) an infinite number of possible values between any two given values.

Height is continuous: between the heights of 179cm and 180cm, many heights exist, depending on how many decimal places are used to record height. In practice, however, heights are usually rounded to the nearest centimetre for convenience. All continuous data are rounded.

**Example 11.4 (Continuous quantitative data) **These (quantitative) variables are *continuous*
(that is, they record continuous quantitative data):

- The
*weight*of 6-year-old Australian children. Values exist between any two given values of weight, by measuring to more decimal places of a kilogram; we would usually quote weight to the nearest kilogram - The
*energy consumption*of houses in a given city. Values exist between any two given values of energy consumption, by measuring to more and more decimal places of a kiloWatt-hour (kWh); we would usually quote to the nearest kWh. - The
*time*spent in front of a computer each day for employees in a given industry. Values exist between any two given times, by measuring to more decimal places of a second; we would usually quote the times to (say) the nearest minute, or the nearest 15 minutes.

### 11.1.2 Qualitative data: Nominal and ordinal data

**Qualitative** data
has distinct labels or categories that are not mathematically numerical.
These categories are called the *levels*
or the *values*
of the variable.

**Definition 11.4 (Qualitative data)**

*Qualitative data*is not

*mathematically*numerical data: it consists of categories or labels.

**Definition 11.5 (Levels)**The

*levels*(or the

*values*) of a qualitative variable refer to the names of the distinct categories.

**Example 11.5 (Qualitative data)**‘Brand of mobile phone’ is a qualitative variable. Many levels are possible (that is, many possible brands), but these could be simplified by defining the levels as ‘Huawei,’ ‘Apple,’ ‘Samsung,’ ‘Google’ and ‘Other.’

Be careful:
*numerical* data
may be qualitative.
Qualitative data are not *mathematically* numerical;
that is,
the numbers don’t have numerical *meanings*.

**Example 11.6 (Qualitative data)**Australian postcodes are numbers, but are

*qualitative*(Example 11.2).

**Think 11.1 (Types of qualitative data) **Here are two survey questions that produce qualitative data.

- What is your blood type?

- Type A.
- Type B.
- Type AB.
- Type O.

- What is your age group?

- Under 20.
- 20 to under 30.
- 30 to under 50.
- 50 or over.

Qualitative data can be further classified as
*nominal* or *ordinal*.
*Nominal* variables are qualitative variables where
the levels have *no natural order*.
*Ordinal* variables are qualitative variables where
the levels do have a *natural order*.
So in Extra Example 11.1,
‘Blood type’ is qualitative *nominal*,
while ‘Age group’ is qualitative *ordinal*.

**Example 11.7 (Nominal data) **This survey question will produce *nominal* data:

The data will be nominal with four levels. The levels can appear in any order: from largest group to smallest, or in alphabetical order. Because there is noHow do you usually get to university?

- Car (as driver or passenger).
- Bus.
- Ride bicycle or walk.
- Other.

*natural*order, the order used should be carefully considered: what is the most useful order when summarising the data?

**Example 11.8 (Ordinal data) **This survey question will produce *ordinal* data:

The data will be ordinal with five levels. Treat the levels in the given order (or the reverse order) makes sense; It would not make sense, for example, to give the levels in alphabetical order.Please indicate the extent to which you agree or disagree with this statement: ‘Permeable pavements technology has the potential to revolutionise green building practices.’

- Strongly disagree.
- Disagree.
- Neither agree or disagree.
- Agree.
- Strongly agree.

**Example 11.9 (Clarity in definitions) **Consider the variable ‘Age.’
Age is *continuous quantitative*,
since we age continuously
(on our birthday, we don’t suddenly get one year greyer with one extra year’s worth of wrinkles…).

Age is usually rounded to the number of completed years, for convenience. However, the age of young children may be given as ‘3 days’ or ‘10 months,’ instead of the nearest year.

Sometimes *Age group* is used instead
(such as
Under 20; 20 to under 30; 30 to under 50; 50 or over).
‘Age group’ is *qualitative ordinal*.

**Example 11.10 (Types of variables) **Consider a study to determine if the weight of 500g bags of pasta
really is at least 500g.
One approach is to record the weight of pasta in each bag
(a *quantitative* variable),
and compare the *average* weight
to the target weight of 500g.

*qualitative*variable, with two

*levels*(underweight; not underweight). We could then report the

*percentage*of bags that are underweight.