11 Describing data

So far, you have learnt to ask a RQ, identify different ways of obtaining data, design the study and collect the data.

In this chapter, you will learn how to describe the data, because this determines how to proceed with the analysis. You will learn to:

  • identify qualitative and quantitative variables.
  • identify nominal and ordinal qualitative variables.
  • identify continuous and discrete quantitative variables.
  • describe data in ways suitable for use in software.

11.1 Quantitative and qualitative data

Understanding the type of data collected is essential before starting any analysis, because the type of data determines how to proceed with summaries and analyses.

Broadly, data may be described as either:

We can also talk about quantitative and qualitative variables. (Remember that variables are measured on the individuals in the study.) The variable is the description of what varies, and the data are the values of the variables that are recorded. Quantitative variables record quantitative data, and qualitative variables record qualitative data.

Quantitative research summarises and analyses data using numerical methods (Sect. 1.7).

Quantitative research can include both quantitative and qualitative variables, because both quantitative and qualitative data can be summarised numerically (Chaps. 13 and 14 respectively) and analysed numerically.

Example 11.1 (Variables and data) 'Age' is a variable because age can vary from individual to individual. The data would be values such as 13 years, 21 years and 76 years.

11.1.1 Quantitative data: Discrete and continuous data

Quantitative data are mathematically numerical. Most data that are counted or measured will be quantitative. Quantitative data is often (but not always) measured with measurement units (such as kg or cm).

Definition 11.1 (Quantitative data) Quantitative data is mathematically numerical data: the numbers themselves have numerical meaning, and it makes sense to be able to perform mathematical operations on them. Most data that are counted or measured will be quantitative.

Be careful: Just because the data are numbers, it does not necessarily mean that the data are quantitative. Mathematically numerical data are quantitative; that is, numbers with numerical meanings.

Example 11.2 (Quantitative data) Australian postcodes are numbers, but are not quantitative. The numbers are just labels. A postcode of 4556 isn't one 'better' or one 'more' than a postcode of 4555.

The values do not have numerical meanings. Indeed, rather than numerical postcodes, alphabetic postcodes could have been chosen. For example, the post code of Caboolture is 4510, but it could have been QCAB for instance.

Quantitative data may be further defined as discrete or continuous.

Discrete quantitative data has possible values that can be counted, at least in theory. Sometimes, the possible values may not have a theoretical upper limit, yet can be still considered 'countable'.

Definition 11.2 (Discrete data) Discrete quantitative data has a countable number of possible values between any two given values of the variable.

Example 11.3 (Discrete quantitative data) These (quantitative) variables are discrete (and so record discrete quantitative data):

  • The number of heart attacks in the previous year experienced by women over 40. Possible values are 0, 1, 2, ...
  • The number of cracked eggs in a carton of 12. Possible values are: 0, 1, 2, ... 12.
  • The number of orthotic devices a person has ever used. Possible values are 0, 1, 2, ...
  • The number of fissures in turbines after 5000 hours of use. Possible values are 0, 1, 2, ...

Continuous quantitative data has values that cannot, at least in theory, be recorded exactly. In other words, another value can always be found between any two given values of the variable, if we measure to a greater number of decimal places. In practice, though, the values need to be rounded to a reasonable number of decimal places.

Definition 11.3 (Continuous data) Continuous quantitative data have (at least in theory) an infinite number of possible values between any two given values.

Height is continuous: between the heights of 179cm and 180cm, many heights exist, depending on how many decimal places are used to record height. In practice, however, heights are usually rounded to the nearest centimetre for convenience. All continuous data are rounded.

Example 11.4 (Continuous quantitative data) These (quantitative) variables are continuous (that is, they record continuous quantitative data):

  • The weight of 6-year-old Australian children. Values exist between any two given values of weight, by measuring to more decimal places of a kilogram; we would usually quote weight to the nearest kilogram
  • The energy consumption of houses in a given city. Values exist between any two given values of energy consumption, by measuring to more and more decimal places of a kiloWatt-hour (kWh); we would usually quote to the nearest kWh.
  • The time spent in front of a computer each day for employees in a given industry. Values exist between any two given times, by measuring to more decimal places of a second; we would usually quote the times to (say) the nearest minute, or the nearest 15 minutes.

11.1.2 Qualitative data: Nominal and ordinal data

Qualitative data has distinct labels or categories that are not mathematically numerical. These categories are called the levels or the values of the variable.

Definition 11.4 (Qualitative data) Qualitative data is not mathematically numerical data: it consists of categories or labels.

Definition 11.5 (Levels) The levels (or the values) of a qualitative variable refer to the names of the distinct categories.

Example 11.5 (Qualitative data) 'Brand of mobile phone' is a qualitative variable. Many levels are possible (that is, many possible brands), but these could be simplified by defining the levels as 'Huawei', 'Apple', 'Samsung', 'Google' and 'Other'.

Be careful: numerical data may be qualitative. Qualitative data are not mathematically numerical; that is, the numbers don't have numerical meanings.

Example 11.6 (Qualitative data) Australian postcodes are numbers, but are qualitative (Example 11.2).

Here are two survey questions that produce qualitative data.

  1. What is your blood type?
  • Type A.
  • Type B.
  • Type AB.
  • Type O.
  1. What is your age group?
  • Under 20.
  • 20 to under 30.
  • 30 to under 50.
  • 50 or over.

What features of the data collected from the questions are similar? What features are different?

Qualitative data can be further classified as nominal or ordinal. Nominal variables are qualitative variables where the levels have no natural order. Ordinal variables are qualitative variables where the levels do have a natural order.

So in the example above, 'Blood type' is qualitative nominal, while 'Age group' is qualitative ordinal.

Definition 11.6 (Nominal qualitative variables) A nominal qualitative variable is a qualitative variable where the levels do not have a natural order.

Definition 11.7 (Ordinal qualitative variables) An ordinal qualitative variable is a qualitative variable where the levels do have a natural order.

Example 11.7 (Nominal data) This survey question will produce nominal data:

How do you usually get to university?

  • Car (as driver or passenger).
  • Bus.
  • Ride bicycle or walk.
  • Other.

The data will be nominal with four levels. The levels can appear in any order: from largest group to smallest, or in alphabetical order.

Since there is no natural order, the order used should be carefully considered: what is the most useful order when summarising the data?

Example 11.8 (Ordinal data) This survey question will produce ordinal data:

Please indicate the extent to which you agree or disagree with this statement: 'Permeable pavements technology has the potential to revolutionise green building practices'.

  • Strongly disagree.
  • Disagree.
  • Neither agree or disagree.
  • Agree.
  • Strongly agree.

The data will be ordinal with five levels. Treat the levels in the given order (or the reverse order) makes sense; it would not make sense, for example, to give the levels in alphabetical order.

Example 11.9 (Clarity in definitions) Consider the variable 'Age'. Age is continuous quantitative, since we age continuously (on our birthday, we don't suddenly get one year greyer with one extra year's worth of wrinkles...).

Age is usually rounded to the number of completed years, for convenience. However, the age of young children may be given as '3 days' or '10 months', instead of the nearest year.

Sometimes Age group is used instead (such as Under 20; 20 to under 30; 30 to under 50; 50 or over). 'Age group' is qualitative ordinal.

Ensure your RQ is clear about which is used!

Example 11.10 (Types of variables) Consider a study to determine if the weight of 500g bags of pasta really is at least 500g. One approach is to record the weight of pasta in each bag (a quantitative variable), and compare the average weight to the target weight of 500g.

Another approach is to record whether or not each bag of pasta weighed at least 500g (bags are not underweight). This would be a qualitative variable, with two levels (underweight; not underweight). We could then report the percentage of bags that are underweight.

11.2 Describing data in jamovi and SPSS

In practice, quantitative research requires the use of a computer for producing graphs and completing calculations. In this book, two statistical software packages are described for analysis of data:

(For reason to avoid Excel and other spreadsheets, read this information from earlier in this book.)

This section makes only brief notes about setting up data in these software packages; consult a comprehensive reference for more (and better) details. For both packages, however, declaring the variables correctly is very important (Table 11.1).

Practically all software, including jamovi and SPSS record data in a spreadsheet-like grid, with the variables in the columns, and the units of analysis in the rows.

TABLE 11.1: Different types of variables, and their descriptions in jamovi and SPSS
Type of variable Further classification In jamovi In SPSS
Qualitative Nominal Nominal Nominal
Ordinal Ordinal Ordinal
Quantative Discrete Continuous, Integer Scale
Continuous Continuous, Decimal Scale

11.2.1 Using jamovi

In jamovi, nominal variables are called Nominal, and ordinal variables are called Ordinal (Table 11.1). In jamovi, continuous quantitative variables are called continuous decimal, and discrete quantitative variables are called (confusingly) continuous integer.

To add this information to jamovi, double-click on the variable name at the top of the data worksheet (Fig. 11.1), which produces Fig. 11.2. This opens an area where the data can be described:

  • Nominal qualitative variables are set as Nominal, and the levels described in the Levels area to the right
  • Ordinal qualitative variables are set as Ordinal, and the levels described in the Levels area to the right.
  • Quantitative continuous variables are set as Continuous with the Data type as Decimal.
  • Quantitative discrete variables are set as Continuous with the Data type as Integer.

When the information has been entered, clicking the up-arrow on the top right (Fig. 11.2) closes this window.

jamovi: The variable names at the top of the columns of data

FIGURE 11.1: jamovi: The variable names at the top of the columns of data

jamovi: Setting the variable type

FIGURE 11.2: jamovi: Setting the variable type

11.2.2 Using SPSS

In SPSS, variables are described in the Variable View window (not the Data View window). Each variable is then described in the Measure column (Fig. 11.3):

  • Nominal qualitative variables are called Nominal.
  • Ordinal qualitative variables are called Ordinal.
  • Quantitative variables are called Scale, regardless of whether they are discrete or continuous.
SPSS: Setting the variable type

FIGURE 11.3: SPSS: Setting the variable type

11.3 Summary

Data and variables can be described as either quantitative (either discrete or continuous) or qualitative (either nominal or ordinal). Variables should be correctly defined in jamovi and SPSS.

11.4 Quick revision questions

A study on the bruising of apples260 aimed to determine the relationship between the recorded surface temperature of apple, the depth of bruising.

The researchers purposefully hit apples with three different forces (200, 700 and 1200 mJ) to inflict bruises.

This was repeated at three different locations of the apple (lower; middle; upper).

The researchers then recorded the depth of the bruising, and recorded the surface temperature at each bruise location.

  1. How would the variable 'region of the apple' be best described?
  2. How would the variable 'depth of bruising'? be best described
  3. How would the variable 'temperature of the bruise location' be best described?
  4. The variable 'force of hit' could be considered as quantitative continuous variable. However, since only a small number of forces are used, it could also be considered as qualitative ordinal.
    If it was considered as qualitative ordinal, how many levels would the variable have?

Progress:

11.5 Exercises

Selected answers are available in Sect. D.11.

Exercise 11.1 A study of lime trees (Tilia cordata) recorded these variables for 385 lime trees in Russia:261

  • the foliage biomass, in kg;
  • the tree diameter (in cm);
  • the age of the tree (in years); and
  • the origin of the tree (one of Coppice, Natural, or Planted).

Describe the variables in the study using the language of this chapter.

Exercise 11.2 Are these variables quantitative (discrete or continuous, and with what units of measurement?), or qualitative (nominal or ordinal, and with what levels?)?

  1. Systolic blood pressure.
  2. Program of enrolment.
  3. Academic grade (HD; DN; CR; PS; FL).
  4. Number of times a person visited the doctor last year.

Exercise 11.3 A study of body mass index and its relationship with use of social media262 recorded these variables (among others) from a group of 1140 participants:

  1. Age (under 45; 45 to 64; 65 or over)
  2. Gender (male; female)
  3. Location (urban; rural)
  4. Social media use (none; low; high)
  5. BMI (body mass index; the body mass in kg, divided by the square of height in cm)
  6. Total sitting time, in minutes per day

For each variable, determine the type of variable: quantitative (discrete or continuous, and with what units of measurement?), or qualitative (nominal or ordinal, and with what levels)?

Exercise 11.4 In a study of the influence of using ankle-foot orthoses in children with cerebral palsy,263 the data in Table 11.2 describe the 15 subjects. (GMFCS is used to describe the impact of cerebral palsy on their motor function, where lower levels means better functionality: the Gross Motor Function Classification System.) Describe the variables in the study.

TABLE 11.2: Describing the sample in the orthoses data set
Gender Age (years) Height (cm) Weight (kg) GMFCS
M 9 136 34.5 1
M 7 106 16.2 2
M 7 129 21.1 1
M 12 152 40.4 1
M 11 146 39.3 2
M 5 113 18.1 1
M 6 112 16.7 2
M 8 112 19.1 1
M 8 138 28.6 1
M 6 116 19.3 1
F 7 113 17.6 1
M 11 141 34.9 1
M 7 136 34.5 1
F 9 128 21.9 1
F 8 133 23.0 1

Exercise 11.5 A study of fertilizer use264 recorded the soil nitrogen after applying different fertilizer doses. These variables were recorded:

  • the fertilizer dose, in kilograms of nitrogen per hectare;
  • the soil nitrogen, in kilograms of nitrogen per hectare; and
  • the fertilizer source; one of 'inorganic' or 'organic'.

Describe the variables in the study.

Exercise 11.6 A study265 recorded the response of kangaroos to drones (one of 'Vigilance', 'No vigilance', 'Flee \(<10\)m', or 'Flee \(>10\)m') and the altitude of the drone (30m, 60m, 100m or 120m). The mob size and sex of the kangaroo was also recorded. Describe the variables in the study.

Exercise 11.7 A study of people who died while taking selfies266 recorded the location (Table 11.3). Which of the following are the variables in the table? For each that is a variable, describe the variable.

  1. The location.
  2. The number of people who died at each location.
  3. The percentage of people who died at each location.
TABLE 11.3: Locations of people dying while taking selfies
Number Percentage
Nature, associated environments 48 43.2
Train, railway, associated structures 22 19.9
Buildings, associated structures 17 15.3
Road, bridge, associated structures 12 10.8
Dams, associated structures 7 6.3
Fields, farms, associated structures 4 3.6
Others 1 0.9