11.1 Quantitative and qualitative data

Understanding the type of data collected is essential before starting any analysis, because the type of data determines how to proceed with summaries and analyses. Broadly, data may be described as either:

We can also talk about quantitative and qualitative variables. (Remember that variables are measured on the individuals in the study.) The variable is the description of what varies, and the data are the values of the variables that are recorded. Quantitative variables record quantitative data, and qualitative variables record qualitative data.

Quantitative research summarises and analyses data using numerical methods (Sect. 1.7).

Quantitative research can include both quantitative and qualitative variables, because both quantitative and qualitative data can be summarised numerically (Chaps. 13 and 14 respectively) and analysed numerically.

Example 11.1 (Variables and data) ‘Age’ is a variable because age can vary from individual to individual. The data would be values such as 13 years, 21 years and 76 years.

11.1.1 Quantitative data: Discrete and continuous data

Quantitative data are mathematically numerical. Most data that are counted or measured will be quantitative. Quantitative data is often (but not always) measured with measurement units (such as kg or cm).

Definition 11.1 (Quantitative data) Quantitative data is mathematically numerical data: the numbers themselves have numerical meaning, and it makes sense to be able to perform mathematical operations on them. Most data that are counted or measured will be quantitative.

Be careful: Just because the data are numbers, it does not necessarily mean that the data are quantitative. Mathematically numerical data are quantitative; that is, numbers with numerical meanings.

Example 11.2 (Quantitative data) Australian postcodes are numbers, but are not quantitative. The numbers are just labels. A postcode of 4556 isn’t one ‘better’ or one ‘more’ than a postcode of 4555.

The values do not have numerical meanings. Indeed, rather than numerical postcodes, alphabetic postcodes could have been chosen. For example, the post code of Caboolture is 4510, but it could have been QCAB.

Quantitative data may be further defined as discrete or continuous. Discrete quantitative data has possible values that can be counted, at least in theory. Sometimes, the possible values may not have a theoretical upper limit, yet can be still considered ‘countable.’

Definition 11.2 (Discrete data) Discrete quantitative data has a countable number of possible values between any two given values of the variable.

Example 11.3 (Discrete quantitative data) These (quantitative) variables are discrete (and so record discrete quantitative data):

The number of heart attacks in the previous year experienced by women over 40. Possible values are 0, 1, 2, …
The number of cracked eggs in a carton of 12. Possible values are: 0, 1, 2, … 12.
The number of orthotic devices a person has ever used. Possible values are 0, 1, 2, …
The number of fissures in turbines after 5000 hours of use. Possible values are 0, 1, 2, …

Continuous quantitative data has values that cannot, at least in theory, be recorded exactly. In other words, another value can always be found between any two given values of the variable, if we measure to a greater number of decimal places. In practice, though, the values need to be rounded to a reasonable number of decimal places.

Definition 11.3 (Continuous data) Continuous quantitative data have (at least in theory) an infinite number of possible values between any two given values.

Height is continuous: between the heights of 179cm and 180cm, many heights exist, depending on how many decimal places are used to record height. In practice, however, heights are usually rounded to the nearest centimetre for convenience. All continuous data are rounded.

Example 11.4 (Continuous quantitative data) These (quantitative) variables are continuous (that is, they record continuous quantitative data):

The weight of 6-year-old Australian children. Values exist between any two given values of weight, by measuring to more decimal places of a kilogram; we would usually quote weight to the nearest kilogram
The energy consumption of houses in a given city. Values exist between any two given values of energy consumption, by measuring to more and more decimal places of a kiloWatt-hour (kWh); we would usually quote to the nearest kWh.
The time spent in front of a computer each day for employees in a given industry. Values exist between any two given times, by measuring to more decimal places of a second; we would usually quote the times to (say) the nearest minute, or the nearest 15 minutes.

11.1.2 Qualitative data: Nominal and ordinal data

Qualitative data has distinct labels or categories that are not mathematically numerical. These categories are called the levels or the values of the variable.

Definition 11.4 (Qualitative data) Qualitative data is not mathematically numerical data: it consists of categories or labels.

Definition 11.5 (Levels) The levels (or the values) of a qualitative variable refer to the names of the distinct categories.

Example 11.5 (Qualitative data) ‘Brand of mobile phone’ is a qualitative variable. Many levels are possible (that is, many possible brands), but these could be simplified by defining the levels as ‘Huawei,’ ‘Apple,’ ‘Samsung,’ ‘Google’ and ‘Other.’

Be careful: numerical data may be qualitative. Qualitative data are not mathematically numerical; that is, the numbers don’t have numerical meanings.

Example 11.6 (Qualitative data) Australian postcodes are numbers, but are qualitative (Example 11.2).

Think 11.1 (Types of qualitative data) Here are two survey questions that produce qualitative data.

What is your blood type?

Type A.
Type B.
Type AB.
Type O.

What is your age group?

Under 20.
20 to under 30.
30 to under 50.
50 or over.

What features of the data collected from the questions are similar? What features are different?

Qualitative data can be further classified as nominal or ordinal. Nominal variables are qualitative variables where the levels have no natural order. Ordinal variables are qualitative variables where the levels do have a natural order. So in Extra Example 11.1, ‘Blood type’ is qualitative nominal, while ‘Age group’ is qualitative ordinal.

Example 11.7 (Nominal data) This survey question will produce nominal data:

How do you usually get to university?

Car (as driver or passenger).

Bus.

Ride bicycle or walk.

Other.

The data will be nominal with four levels. The levels can appear in any order: from largest group to smallest, or in alphabetical order. Because there is no natural order, the order used should be carefully considered: what is the most useful order when summarising the data?

Example 11.8 (Ordinal data) This survey question will produce ordinal data:

Please indicate the extent to which you agree or disagree with this statement: ‘Permeable pavements technology has the potential to revolutionise green building practices.’

Strongly disagree.

Disagree.

Neither agree or disagree.

Agree.

Strongly agree.

The data will be ordinal with five levels. Treat the levels in the given order (or the reverse order) makes sense; It would not make sense, for example, to give the levels in alphabetical order.

Example 11.9 (Clarity in definitions) Consider the variable ‘Age.’ Age is continuous quantitative, since we age continuously (on our birthday, we don’t suddenly get one year greyer with one extra year’s worth of wrinkles…).

Age is usually rounded to the number of completed years, for convenience. However, the age of young children may be given as ‘3 days’ or ‘10 months,’ instead of the nearest year.

Sometimes Age group is used instead (such as Under 20; 20 to under 30; 30 to under 50; 50 or over). ‘Age group’ is qualitative ordinal.

Ensure your RQ is clear about which is used!

Example 11.10 (Types of variables) Consider a study to determine if the weight of 500g bags of pasta really is at least 500g. One approach is to record the weight of pasta in each bag (a quantitative variable), and compare the average weight to the target weight of 500g.

Another approach is to record whether or not each bag of pasta weighed at least 500g (bags are not underweight). This would be a qualitative variable, with two levels (underweight; not underweight). We could then report the percentage of bags that are underweight.