11 Describing data
So far, you have learnt to ask a RQ, design a study, and collect the data. In this chapter, you will learn how to describe the data, because this determines how to proceed with the analysis. You will learn to:
- identify qualitative and quantitative variables.
- identify nominal and ordinal qualitative variables.
- identify continuous and discrete quantitative variables.
- describe data in ways suitable for use in software.
11.1 Quantitative and qualitative data
Understanding the type of data collected is essential before starting any analysis, because the type of data determines how to proceed with summaries and analyses. Broadly, data may be described as either quantitative data (Sect. 11.1.1) or qualitative data (Sect. 11.1.2). The data are the recorded values of the variables, so sometimes we talk about quantitative and qualitative variables. Quantitative variables record quantitative data; qualitative variables record qualitative data.
Example 11.1 (Variables and data) 'Age' is a variable because age can vary from individual to individual. The data would be values such as 13 months, 21 years and 76 years.
Quantitative research summarises and analyses data using numerical methods (Sect. 1.4). Quantitative research can include both quantitative and qualitative variables, because both quantitative and qualitative data can be summarised numerically (Chaps. 13 and 14 respectively) and analysed numerically.
11.1.1 Quantitative data: discrete and continuous data
Quantitative data are mathematically numerical. Most data arising from counting or measuring will be quantitative. Quantitative data are often (but not always) measured with measurement units (such as kg or cm). Be careful: Numerical data are not necessarily quantitative. Only mathematically numerical data are quantitative; that is, numbers with numerical meanings.
Definition 11.1 (Quantitative data) Quantitative data is mathematically numerical: the numbers have numerical meaning, and performing mathematical operations on them makes sense. Most data arising from counting or measuring will be quantitative.
Example 11.2 (Quantitative data) Australian postcodes are four-digit numbers, but are not quantitative; the numbers are just labels. A postcode of 4556 isn't one 'better' or one 'more' than a postcode of 4555. The values do not have numerical meanings.
Indeed, alphabetic postcodes could have been chosen. For example, the postcode of Caboolture is 4510, but could have been QCAB.
Quantitative data may be further classified as discrete or continuous. Discrete quantitative data has possible values that can be counted, at least in theory. Sometimes, the possible values may not have a theoretical upper limit, yet are still considered 'countable'. Continuous quantitative data has values that cannot, at least in theory, be recorded exactly: another value can always be found between any two given values of the variable, if we measure to a greater number of decimal places. In practice, though, the values need to be rounded to a reasonable number of decimal places.
Definition 11.2 (Discrete data) Discrete quantitative data has a countable number of possible values between any two given values of the variable.
Example 11.3 (Discrete quantitative data) These quantitative variables are discrete:
- The number of heart attacks in the previous year experienced by Croatian women over 40. Possible values are 0, 1, 2, ...
- The number of cracked eggs in a carton of 12. Possible values are: 0, 1, 2, ... 12.
- The number of orthotic devices a person has ever used. Possible values are 0, 1, 2, ...
- The number of fissures in turbines after 5000 hours of use. Possible values are 0, 1, 2, ...
Definition 11.3 (Continuous data) Continuous quantitative data have (at least in theory) an infinite number of possible values between any two given values.
Height is continuous: between the heights of 179cm and 180cm, many heights exist, depending on how many decimal places are used to record height. In practice, however, heights are usually rounded to the nearest centimetre for convenience. All continuous data are rounded.
Example 11.4 (Continuous quantitative data) These quantitative variables are continuous:
- The weight of 6-year-old Fijian children. Values exist between any two given values of weight, by measuring to more decimal places of a kilogram. However, weights are usually reported to the nearest kilogram.
- The energy consumption of houses in London. Values exist between any two given values of energy consumption, by measuring to more and more decimal places of a kiloWatt-hour (kWh). Consumption would usually be given to the nearest kWh.
- The time spent in front of a computer each day for employees in a given industry. Values exist between any two given times, by measuring to more decimal places of a second. The values may be reported to the nearest minute, or the nearest 15 minutes.
Sometimes, discrete quantitative data with a very large number of possible values may be treated as continuous.
Example 11.5 (Treating discrete data as continuous) Annual income is discrete, since income is not paid in fractions of a cent. However, annual incomes are usually much larger than cents, and vary at scales much greater than cents, and so are usually treated as continuous.
11.1.2 Qualitative data: nominal and ordinal data
Qualitative data has distinct labels or categories, and are not mathematically numerical. Be careful: numerical data may be qualitative, provided the numbers don't have numerical meanings.
Definition 11.4 (Qualitative data) Qualitative data is not mathematically numerical data: it consists of categories or labels.
The categories of a qualitative variable are called the levels or the values of the variable.
Definition 11.5 (Levels) The levels (or the values) of a qualitative variable refer to the names of the distinct categories.
Example 11.6 (Qualitative data) 'Brand of mobile phone' is a qualitative variable. Many levels are possible (that is, many possible brands), but could be simplified by defining the levels as 'Apple', 'Samsung', 'Google' and 'Other'.
Example 11.7 (Qualitative data) Australian postcodes are numbers, but are qualitative (Example 11.2).
Here are two qualitative variables. What features of the data collected from the two questions are similar? What features are different?
Blood type. The levels are: Type A; Type B; Type AB; Type O.
Age group. The levels are: Under 20; 20 to under 30; 30 to under 50; 50 or over.
Qualitative data can be further classified as nominal or ordinal. Nominal variables are qualitative variables where the levels have no natural order. Ordinal variables are qualitative variables where the levels do have a natural order. In the question above, 'Blood type' is qualitative nominal, while 'Age group' is qualitative ordinal.
Definition 11.6 (Nominal qualitative variables) A nominal qualitative variable is a qualitative variable where the levels do not have a natural order.
Definition 11.7 (Ordinal qualitative variables) An ordinal qualitative variable is a qualitative variable where the levels do have a natural order.
Example 11.8 (Nominal data) The variable 'How students get to university' is nominal, where the levels may be: Car (as driver or passenger); Bus; Ride bicycle; Walk; Other.
The data will be nominal with five levels. The levels can appear in any order: from largest group to smallest, or in alphabetical order. Since there is no natural order, the order used should be carefully considered: what is the most useful order when summarising the data?
Example 11.9 (Ordinal data) This survey question will produce ordinal data:
Please indicate the extent to which you agree or disagree with this statement: 'Permeable pavements technology will revolutionise green building practices'.
- Strongly disagree.
- Disagree.
- Neither agree or disagree.
- Agree.
- Strongly agree.
The data will be ordinal with five levels. Giving the levels in the given order (or the reverse order) makes sense; giving the levels in alphabetical order, for example, would not make sense.
Example 11.10 (Clarity in definitions) 'Age' is a continuous quantitative variable, since age could be measured to many decimal places of a second. Age is usually rounded down to the number of completed years, for convenience. However, the age of young children may be given as '3 days' or '10 months'.
Sometimes Age group is used instead (such as Under 20; 20 to under 50; 50 or over) instead of Age. 'Age group' is qualitative ordinal.
Ensure you are clear about which is used!
Example 11.11 (Types of variables) Consider a study to determine if the weight of 500g bags of pasta weigh 500g or more. One approach is to record the weight of pasta in each bag (a quantitative variable), and compare the average weight to the target weight of 500g.
Another approach is to record whether each bag of pasta was underweight or not (perhaps using a balance scale). This would be a qualitative variable, with two levels (underweight; not underweight). The percentage of bags that are underweight could be reported.
Many statistical software packages, like jamovi and SPSS, require the user to describe the variables. This enables the software to produce appropriate output and suggest appropriate analyses.
11.2 Summary
The type of data collected determines the types of summaries and analyses that are needed. Data and variables can be described as either:
- quantitative (either discrete or continuous) if they are mathematically numerical; or
- qualitative (either nominal or ordinal) if they are not mathematically numerical.
11.3 Quick revision questions
A study on the bruising of apples (Doosti-Irani et al. 2016) explored the relationship between the surface temperature of apple, and the depth of bruising.
The researchers purposefully hit apples with three different forces (200, 700 and 1200mJ) to inflict bruises. This was repeated at three different locations of the apple (lower; middle; upper). The researchers then recorded the depth of the bruising, and the surface temperature (in ^{o}C) at each bruise location.
- How would the variable 'location on the apple' be best described?
- How would the variable 'depth of bruising' be best described
- How would the variable 'temperature of the bruise location' be best described?
- The variable 'force of hit' could be considered as quantitative continuous variable.
However, since only a small number of forces are used, it should probably be considered qualitative ordinal.
How many levels would the variable have?
11.4 Exercises
Selected answers are available in Sect. D.11.
Exercise 11.1 True or false: These variables quantitative and continuous.
- The knee-flex angle after treatment
- Whether or not laser drilling of small holes in concrete is successful
- Length of time between arrival at an emergency department, and admission
- Number of eggs laid by female brush turkeys
- Whether or not a child eats the recommended serving of fruit each day
Exercise 11.2 True or false: These variables qualitative and nominal.
- The age group of respondents to a survey
- Whether a cyclist is wearing a helmet or not
- The dosage of a medication applied: 40mg per day, 60 mg per day, or 80 mg per day
- The brand of fertilizer being applied
- The approximate age of trees
Exercise 11.3 A study recorded whether or not people (who were not swimming) were wearing head-protection at the beach. The results were recorded as None; Cap; or Hat. Which of the following could be used to describe this variable?
- Nominal
- Qualitative
- Continuous
- Quantitative
- Ordinal
Exercise 11.4 A study of lime trees (Tilia cordata) recorded these variables for 385 lime trees in Russia (Schepaschenko et al. 2017; P. K. Dunn and Smyth 2018): the foliage biomass (in kg); the tree diameter (in cm); the age of the tree (in years); and the origin of the tree (one of Coppice, Natural, or Planted).
Describe the variables in the study using the language of this chapter.
Exercise 11.5 Are these variables quantitative (discrete or continuous, and with what units of measurement?), or qualitative (nominal or ordinal, and with what levels?)?
- Systolic blood pressure.
- Program of enrolment.
- Academic grade (HD; DN; CR; PS; FL).
- Number of times a person visited the doctor last year.
Exercise 11.6 A study of body mass index and its relationship with use of social media (Alley et al. 2017) recorded these variables (among others) from a group of 1140 participants:
- Age (under 45; 45 to 64; 65 or over).
- Gender (male; female).
- Location (urban; rural).
- Social media use (none; low; high).
- BMI (body mass index; the body mass in kg, divided by the square of height in cm).
- Total sitting time, in minutes per day.
For each variable, determine the type of variable: quantitative (discrete or continuous, and with what units of measurement?), or qualitative (nominal or ordinal, and with what levels)?
Exercise 11.7 In a study of the influence of using ankle-foot orthoses in children with cerebral palsy (Swinnen et al. 2017), the data in Table 11.1 describe the 15 subjects. (GMFCS is used to describe the impact of cerebral palsy on their motor function, where lower levels means better functionality: the Gross Motor Function Classification System.) Describe the variables in the study.
Gender | Age (years) | Height (cm) | Weight (kg) | GMFCS |
---|---|---|---|---|
M | 9 | 136 | 34.5 | 1 |
M | 7 | 106 | 16.2 | 2 |
M | 7 | 129 | 21.1 | 1 |
M | 12 | 152 | 40.4 | 1 |
M | 11 | 146 | 39.3 | 2 |
M | 5 | 113 | 18.1 | 1 |
M | 6 | 112 | 16.7 | 2 |
M | 8 | 112 | 19.1 | 1 |
M | 8 | 138 | 28.6 | 1 |
M | 6 | 116 | 19.3 | 1 |
F | 7 | 113 | 17.6 | 1 |
M | 11 | 141 | 34.9 | 1 |
M | 7 | 136 | 34.5 | 1 |
F | 9 | 128 | 21.9 | 1 |
F | 8 | 133 | 23.0 | 1 |
Exercise 11.8 A study of fertilizer use (Lane 2002; P. K. Dunn and Smyth 2018) recorded the soil nitrogen after applying different fertilizer doses. These variables were recorded:
- the fertilizer dose, in kilograms of nitrogen per hectare;
- the soil nitrogen, in kilograms of nitrogen per hectare; and
- the fertilizer source; one of 'inorganic' or 'organic'.
Describe the variables in the study.
Exercise 11.9 A study (Brunton et al. 2019) recorded the response of kangaroos to overhead drones (one of 'No vigilance', 'Vigilance', 'Flee \(<10\)m', or 'Flee \(>10\)m') and the altitude of the drone (30m, 60m, 100m or 120m). The mob size and sex of the kangaroo was also recorded. Describe the variables in the study.
Exercise 11.10 A study of people who died while taking selfies (Dokur, Petekkaya, and Karadağ 2018) recorded the location (Table 11.2). Which of the following are the variables in the table? For each that is a variable, describe the variable.
- The location.
- The number of people who died at each location.
- The percentage of people who died at each location.
Number | Percentage | |
---|---|---|
Nature, associated environments | 48 | 43.2 |
Train, railway, associated structures | 22 | 19.9 |
Buildings, associated structures | 17 | 15.3 |
Road, bridge, associated structures | 12 | 10.8 |
Dams, associated structures | 7 | 6.3 |
Fields, farms, associated structures | 4 | 3.6 |
Others | 1 | 0.9 |