10 Collecting data

So far, you have learnt to ask a RQ and design the study. In this chapter, you will learn how to:

  • record the important steps in data collection.
  • describe study protocols.
  • ask survey questions.

10.1 Protocols

If the RQ is well-constructed, terms are clearly defined, and the study is well designed and explained, then the process for collecting the data should be easy to describe. Data collection is often time-consuming, tedious and expensive, so collecting the data correctly first time is important.

Before collecting the data, a plan should be established and documented that explains exactly how the data will be obtained, which will include operational definitions (Sect. 2.11). This plan is called a protocol.

Definition 10.1 (Protocol) A protocol is a procedure documenting the details of the design and implementation of studies, and for data collection.

Unforeseen complications are not unusual, so often a pilot study (or a practice run) is conducted before the real data collection, to identify problems with the study design or data collection, or ways to improve the study design or data collection. The pilot study may suggest changes to the protocol. (Pilot studies may also be useful for determining the size of the sample; see Sect. 30.4.)

Definition 10.2 (Pilot study) A pilot study is a small test run of the study protocol used to check that the protocol is appropriate and practical, and to identify (and hence fix) possible problems with the study design or protocol.

A pilot study allows the researcher to:

  • determine the feasibility of the data collection protocol.
  • identify unforeseen challenges.
  • obtain data to determine appropriate sample sizes (Sect. 30).
  • potentially save time and money.

The data can be collected once the protocol has been finalised. Protocols ensure studies are repeatable (Sect. 4.4) so others can confirm or compare results, and others can understand exactly what was done, and how. Protocols should indicate how design aspects (such as blinding the individuals, random allocation of treatments, etc.) will happen. The final protocol, without pedantic detail, should be reported. Diagrams can be useful to support explanations. All studies should have a well-established protocol for describing how the study was done.

A protocol usually has at least three components that describe:

  1. How individuals are chosen from the population (i.e., external validity); and
  2. How information is collected from the individuals (i.e., internal validity); and
  3. The analyses, and what software (and version) was used.

Example 10.1 (Protocol) To increase the nutritional value of cookies, researchers made cookies using pureed green peas in place of margarine (Romanchik-Cerpovicz, Jeffords, and Onyenwoke 2018). The researchers wanted to assess the acceptance of these cookies to college students.

The protocol discussed how the individuals were chosen (p. 4):

...through advertisement across campus from students attending a university in the southeastern United States.

This voluntary sample comprised \(80.6\)% women, a higher percentage of women than in the general population, or the college population. (Other extraneous variables were also recorded.)

Exclusion criteria were also applied, excluding people "with an allergy or sensitivity to an ingredient used in the preparation of the cookies" (p. 5). The researchers also described how the data was obtained from the individuals (p. 5):

During the testing session, panelists were seated at individual tables. Each cookie was presented one at a time on a disposable white plate. Samples were previously coded and randomized. The presentation order for all samples was \(25\)%, \(0\)%, \(50\)%, \(100\)% and \(75\)% substitution of fat with puree of canned green peas. To maintain standard procedures for sensory analysis [...], panelists cleansed their palates between cookie samples with distilled water (\(25^\circ\)C) [...] characteristics of color, smell, moistness, flavor, aftertaste, and overall acceptability, for each sample of cookies [was recorded]...

Thus, internal validity was managed using random allocation, blinding individuals, and washouts. Details are also given of how the cookies were prepared, and how objective measurements (such as moisture content) were determined.

The analyses and software used were also given.

Consider this partial protocol, which shows honesty in describing a protocol:

Fresh cow dung was obtained from free-ranging, grass fed, and antibiotic-free Milking Shorthorn cows (Bos taurus) in the Tilden Regional Park in Berkeley, CA. Resting cows were approached with caution and startled by loud shouting, whereupon the cows rapidly stood up, defecated, and moved away from the source of the annoyance. Dung was collected in ZipLoc bags (\(1\) gallon), snap-frozen and stored at \(-80\) C.

--- Hare et al. (2008), p. 10

10.2 Collecting data using questionnaires

10.2.1 Writing questions

Collecting data using questionnaires is common for both observational and experimental studies. Questionnaires are very difficult to do well: question wording is crucial, and surprisingly difficult to get right (Fink 1995). Pilot testing questionnaires is crucial!

Definition 10.3 (Questionnaire) A questionnaire is a set of questions for respondents to answer.

A questionnaire is a set of question to obtain information from individuals. A survey is an entire methodology, that includes gathering data using a questionnaire, finding a sample, and other components.

Questions in a questionnaire may be open-ended (respondents can write their own answers) or closed (respondents select from a small number of possible answers, as in multiple-choice questions). Open and closed questions both have advantages and disadvantages. Answers to open questions more easily lend themselves to qualitative analysis.

This section briefly discusses writing questions (Sect. 10.2).

Example 10.2 (Open and closed questions) German students were asked a series of questions about microplastics (Raab and Bogner 2021), including:

  1. Name sources of microplastics in the household.
  2. In which ecosystems are microplastics in Germany? Tick the answer (multiple ticks are possible). Options: (a) sea; (b) rivers; (c) lakes; (d) groundwater.
  3. Assess the potential danger posed by microplastics. Options: (a) very dangerous; (b) dangerous; (c) hardly dangerous; (d) not dangerous.

The first question is an open: respondents could provide their own answers. The second question is closed, where multiple options can be selected. The third question is closed, where only one option can be selected

When framing questionnaire questions, remember:

  • Avoid leading questions, which may lead respondents to answer a certain way. Question wording is the usual reason for leading questions.
  • Avoid ambiguity: avoid unfamiliar terms and unclear questions.
  • Avoid asking the uninformed: avoid asking respondents about issues they don't know about. Many people will give a response even if they do not understand (such responses are worthless). For example, people may give directions to places that do not even exist (Collett and O’Shea 1976).
  • Avoid complex and double-barrelled questions, which can be hard to understand.
  • Avoid problems with ethics: avoid questions about people breaking laws, or revealing confidential or private information. In special cases and with justification, ethics committees may allow such questions.
  • Ensure clarity in question wording.
  • Ensure options are mutually exhaustive, so that answers fit into only one category.
  • Ensure options are exhaustive, so that the categories cover all options.

Example 10.3 (Poor question wording) Consider a questionnaire asking these questions:

  1. Because bottles from bottled water create enormous amounts of non-biodegradable landfill and hence threaten native wildlife, do you support banning bottled water?
  2. Do you drink more water now?
  3. Are you more concerned about Coagulase-negative Staphylococcus or Neisseria pharyngis in bottled water?
  4. Do you drink water in plastic and glass bottles?
  5. Do you have a water tank installed illegally, without permission?
  6. Do you avoid purchasing water in plastic bottles unless it is carbonated, unless the bottles are plastic but not necessarily if the lid is recyclable?

Question 1 is leading because the expected response is obvious.

Question 2 is ambiguous: it is unclear what 'more water now' is being compared to.

Question 3 is unlikely to be answerable, as most people will be uninformed. Nonetheless, many people will still give an opinion. This data will be effectively useless, but the researcher may not realise this.

Question 4 is double-barrelled, and would be better asked as two separate questions (one asking about plastic bottles, and one about glass bottles).

Question 5 is unlikely to be given ethical approval or to obtain truthful answers, as respondents are unlikely to admit to breaking rules.

Question 6 is unclear, since knowing what a yes or no answer means is confusing.

Example 10.4 (Question wording) Question wording can be important (Jardina 2018).

In the 2014 General Social Survey (https://gss.norc.org), when white Americans were asked for their opinion of the amount America spends on welfare, \(58\)% of respondents answered 'Too much'.

However, when white Americans were asked for their opinion of the amount America spends on assistance to the poor, only \(16\)% of respondents answered 'Too much'.

Example 10.5 (Leading question) Consider this question:

Do you like this new orthotic?

This question is leading, since liking is the only option presented. Better would be:

Do you like or dislike this new orthotic?

Example 10.6 (Mutually exclusive options) In a study to determine the time doctors spent on patients (from Chan et al. (2008)), doctors were given the options:

  • \(0\)--\(5\) mins;
  • \(5\)--\(10\) mins; or
  • more than \(10\) mins.

This is a poor question, because a respondent does not know which option to select for an answer of '\(5\) minutes'. The options are not mutually exclusive.

The following (humourous) video shows how questions can be manipulated by those not wanting to be ethical:

10.2.2 Challenges using questionnaires

Using questionnaires presents myriad challenges.

  • Non-response bias (Sect. 5.10): Non-response bias is common with questionnaires, as they are often used with voluntary-response samples. The people who do not respond to the survey may be different than those who do respond.
  • Response bias (Sect. 5.10): People do not always answer truthfully; for example, what people say may not correspond with what people do (Sect. 9.4). Sometimes this is unintentional (e.g., poor questions wording), due to embarrassment or because questions are controversial. Sometimes, respondents repeatedly provide the same answer when a series of multichoice questions are presented (perhaps due to boredom).
  • Recall bias: People may not be able to accurately recall past events clearly, or recall when they happened.
  • Question order: The order of the questions can influence the responses.
  • Interpretation: Phrases and words such as "Sometimes" and "Somewhat disagree" may means different things to different people.

Many of these can be managed with careful questionnaire design, but discussing the methods are beyond the scope of this book.

10.3 Chapter summary

Having a detailed procedure for collecting the data (the protocol) is important. Using a pilot study to trial the protocol an often reveal unexpected changes necessary for a good protocol. Creating good questionnaires questions is difficult, but important.

10.4 Quick review questions

  1. What is the biggest problem with this question: 'Do you have bromodosis?'

  2. What is the biggest problem with this question: 'Do you spend too much time connected to the internet?'

  3. What is the biggest problem with this question: 'Do you eat fruits and vegetables?'

  4. Which of these are reasons for producing a well-defined protocol?

    • It allows the researchers to make the study externally valid.
    • It ensures that others know exactly what was done.
    • It ensures that the study is repeatable for others.
  5. Which of the following questionnaire questions likely to be leading questions?

    • Do you, or do you not, believe that permeable pavements are a viable alternative to traditional pavements?
    • Do you support a ban on bottled water?
    • Do you believe that double-gloving by paramedics reduces the risk of infection, increases the risk of infection, or makes no difference to the risk of infection?
    • Should Ireland ban breakfast cereals with unhealthy sugar levels?

10.5 Exercises

Selected answers are available in App. E.

Exercise 10.1 What is the problem with this question?

What is your age? (Select one option)

  • Under \(18\)
  • Over \(18\)

Exercise 10.2 What is the problem with this question?

How many children do you have? (Select one option)

  • None
  • 1 or 2
  • 2 or 3
  • More than 4

Exercise 10.3 Which of these questionnaire questions is better? Why?

  1. Should concerned cat owners vaccinate their pets?
  2. Should domestic cats be required to be vaccinated or not?
  3. Do you agree that pet-owners should have their cats vaccinated?

Exercise 10.4 Which of these questionnaire questions is better? Why?

  1. Do you own an environmentally-friendly electric vehicle?
  2. Do you own an electric vehicle?
  3. Do you own or do you not own an electric vehicle?

Exercise 10.5 In a study of sunscreen use (Falk and Anderson 2013), participants were asked questions, including these:

  • How often do you sun bathe with the intention to tan during the summer in Sweden?
    (Possible answers: never, seldom, sometimes, often, always).
  • How long do you usually stay in the sun between \(11\)am and \(3\)pm, during a typical day-off in the summer (June--August)?
    (Possible answers: \(<30\) min, \(30\) min--\(1\) h, \(1\)--\(2\) h, \(2\)--\(3\) h, \(>3\) h).

Critique these questions. What biases may be present?

Exercise 10.6 In a study of children's knowledge of their natural environment (Morón-Monge, Hamed, and Morón Monge 2021), primary school children (from Andalusia, Spain) were asked three questions:

  1. Do you usually visit Guadaira Park?
    • No, I don’t like parks.
    • No, I don’t usually visit it.
    • Yes, once per week.
    • Yes, more than once a week
  2. How many times have you visited nature (the beach, countryside, mountains, etc.) in the last month?
    • Never
    • Once
    • Two to three times
    • More than three times
  3. Which is your favorite natural place?
    • Write a story
    • Draw a picture

Which questions are open and which are closed? Critique the questions.