5 External validity: sampling

You have learnt to ask a RQ, and identify a study design. In this chapter, you will learn to:

  • distinguish and explain precision and accuracy.
  • distinguish random and non-random sampling.
  • explain why random samples are preferred over non-random samples.
  • identify, describe and use different sampling methods.
  • identify ways to obtain samples more likely to be representative.

5.1 Introducing external validity

In a well-designed study, the researchers learn about the population by studying just one of the countless possible samples. That is, ideally the sample that is studied is representative of the population, so the the results from the sample generalise to the population. This is called external validity.

Definition 5.1 (External validity) External validity refers to the ability to generalise the results to the rest of the population, beyond just those in the sample studied.

External validity does not mean that the results apply more widely than the intended population.

Example 5.1 (External validity) Suppose the population in a study is Californian university students. The sample comprises the Californian university students actually studied by the researchers. The study is externally valid if the sample is representative of all Californian university students.

The results will not necessarily apply to university students outside of Californian (though they may), or all Californian residents. However, this is irrelevant for external validity. External validity concerns how the sample represents the intended population in the RQ, which is Californian university students.

5.2 The idea of sampling

Studying every member of a population is very rare due to cost, time, ethics, logistics and/or practicality. Usually a subset of the population (a sample) is studied, comprising some individuals from the population. Many different samples are possible.

The challenge of research is learning about a population from studying just one of the countless possible samples.

Example 5.2 (Samples) A study of the effectiveness of aspirin in treating headaches cannot possibly study every single human who may one day take aspirin. Not only would this be prohibitively expensive, time-consuming, and impractical, but the study would not even study those not yet born who might use aspirin.

Using the whole target population is impossible, and a sample must be used.

Studying one of many possible samples raises questions:

  • Which individuals should be included in the sample?
  • How many individuals should be included in the sample?

The first issue is studied in this chapter. The second issue is studied later (Chap. 30), after learning about the implications of studying samples rather than populations.

Many samples are possible, and every sample is likely to be different. Hence, the results of studying a sample depend on which individuals are in the studied sample. The differences in the samples, and the results from each sample, are called sampling variation. That is, each sample has different individuals, produces different data, and may lead to different answers to the RQ.

Example 5.3 (Number of samples possible) In a 'population' as small as \(100\), the number of possible samples of size \(25\) is more than twice the number of people living on earth.

This is the challenge of research: making decisions about populations, using just one of the many possible samples. A lot can be learnt about the population if the task of selecting a sample is approached correctly.

Almost always, samples are studied, not populations. Many samples are possible, and every sample is likely to be different, and hence the results from every sample are likely to be different. This is called sampling variation.

As a result, we can never be certain about the conclusions from the sample, though special techniques allow us to make decisions about the population from a sample.

The animation below shows how the estimates calculated from a sample vary from sample to sample. We know that \(50\)% of cards in a fair, shuffled pack are red, but each hand of ten cards can produce a different percentage of red cards (and not always \(50\)%). This is a simple example of sampling variation.

5.3 Precision and accuracy

Two questions concerning sampling in Sect. 5.2 were: which individuals should be in the sample, and how many individuals should be in the sample. The first question addresses the accuracy in of a sample value to estimate a population value. The second addresses the precision with which the population value is estimated using a sample. An estimate that is not accurate is called biased (Sect. 5.11; Def. 6.3).

Definition 5.2 (Accuracy) Accuracy refers to how close a sample estimate is likely to be to the population value, on average.

Definition 5.3 (Precision) Precision refers to how similar the sample estimates from different samples are likely to be to each other (that is, how much variation is likely in the sample estimates).

Using this language:

  • The sampling method (i.e., how the sample is selected) impacts the accuracy of the sample estimate (i.e., the external validity of the study).
  • The size of the sample impacts the precision of the sample estimate.

Large samples are more likely to produce precise estimates, but they may or may not be accurate estimates. Similarly, random samples are likely to produce accurate estimates, but they may or may not be precise. As an analogy, consider an archer aiming at a target. The shots can be accurate, or precise... or, ideally, both (Fig. 5.1).

Precision and accuracy: Each dot indicates where a shot lands, and is like a sample estimate of the population value (shown by the black central dot)

FIGURE 5.1: Precision and accuracy: Each dot indicates where a shot lands, and is like a sample estimate of the population value (shown by the black central dot)

Example 5.4 (Precision and accuracy) To estimate the average age of all Canadians, \(9000\) Canadian school children could be sampled.

The answer obtained from the sample will be inaccurate because the sample is not representative of all Canadians. Since the sample is large, the answer will give a precise answer but to a different question: 'What is the average age of Canadian school children?'

5.4 Types of sampling

One key to obtaining accurate estimates about the population from the sample (maximising externally validity) is to ensure that the sample faithfully represents the population. So, how is a representative sample selected from of the population?

The individuals selected for the sample can be chosen using either random sampling or non-random sampling. The word random here has a specific meaning that is different than how it is often used in everyday use.

Definition 5.4 (Random) In research and statistics, random means determined completely by impersonal chance.

5.4.1 Random sampling methods

In a random sample, each individual in the population can be selected, and is chosen on the basis of impersonal chance (such as using a random number generator, or a table of random numbers). Some examples of random sampling methods appear in Sects. 5.5 to 5.9, and summarised in Table 5.1.

The results obtained from a random sample are likely to generalise to the population from which the sample is drawn; that is, random samples are likely to produce externally valid and accurate studies.

TABLE 5.1: Comparing four types of random sampling
Type Stage 1 Stage 2 Ref.
Simple random Individuals chosen at random Sect. 5.5
Systematic Start at a random location Take every \(n\)th element thereafter Sect. 5.6
Stratified Split into a few large groups ('strata') of similar individuals Select a simple random sample from every stratum Sect. 5.7
Cluster Split into many small groups ('clusters'); select simple random sample of clusters Select all individuals in the chosen clusters Sect. 5.8
Multistage Select simple random sample from the larger collection of units Select simple random sample from those chosen in Stage 1; etc. Sect. 5.9

Testing a pot of soup is similar. If the soup is stirred (randomised), the whole pot need not be tasted. An overall impression of the population (or the soup) is not obtained from a non-random sample (sampling from non-stirred soup).

5.4.2 Non-random sampling methods

A non-random sample is selected using some personal input from the researchers. Examples of non-random samples include:

  • Judgement sample: Individuals are selected based on the researchers' judgement (possibly unconsciously), perhaps because the individuals may appear agreeable, supportive, accessible, or helpful. For example, researchers may select rats that are less aggressive, or plants that are accessible.
  • Convenience sample: Individuals are selected because they are convenient for the researcher. For example, researchers may study beaches that are nearby, or use their friends for a study.
  • Voluntary response (self-selecting) sample: Individuals participate if they wish to. For example, researchers may ask people to volunteer to take a survey.

In non-random sampling, the individuals in the study may be different than those not in the study. That is, non-random samples are not likely to be externally valid.

Using a non-random sample means that the results probably do not generalise to the intended population: they probably do not produce externally valid or accurate studies.

5.5 Simple random sampling

The most straightforward idea for a random sample is a simple random sample.

Definition 5.5 In a simple random sample, every possible sample of a given size has the same chance of being selected.

Selecting a simple random sample requires a list of all members of the population, called the sampling frame, from which to select a sample. Often, obtaining the sampling frame is difficult or impossible, and so finding a simple random sample is also difficult. For example, finding a simple random sample of wombats would require having a list and location of all wombats. This is absurd; other random sampling methods, like special ecological sampling methods, would be used instead (e.g., Manly and Alberto (2014)).

Definition 5.6 (Sampling frame) The sampling frame is a list of all the individuals in the population.

Selecting a simple random sample from the sampling frame can be performed using random numbers (e.g., using random number tables, or websites like https://www.random.org). A smaller version of this webpage, which generates one number at a time, is below; just press Generate. The numbers generated by this widget come from the true random number generator at RANDOM.ORG. (The webpage generates as many numbers as you want all at the same time.) Other random sampling methods use a system to select at random, rather than by human choice, and some avoid the need for a sampling frame.

This book assumes simple random samples, unless otherwise noted.

Example 5.5 (Simple random sampling) Consider this RQ:

For students at a large course at a university, is the average typing speed (in words per minute) the same for those aged under \(25\) ('younger') and \(25\) or over ('older')?

Suppose budget and time constraints mean only \(40\) students (out of \(441\)) can be selected for the study above. The sampling frame is the list of all students enrolled in the course. Obtaining the sampling frame is feasible here; instructors have access to this information for grading.

A simple random sample could be found using the course enrolment list, placing all \(441\) student names into rows of a spreadsheet (ordered by name, student ID, or any way). Then, use random numbers to select \(40\) rows at random (without repeating numbers) between \(1\) and \(441\) inclusive. For instance, when I used random.org, the first few random numbers were: 410, 215, 384, 158, 296, ...

Every student chosen using this method becomes part of the study. If a student could not be contacted, more students could be chosen at random to ensure \(40\) students participated (see animation below). The sample comprises \(25\) older students and \(15\) younger students.

5.6 Systematic sampling

In systematic sampling, the first case is randomly selected; then, more individuals are selected at regular intervals thereafter. In general, we say that every \(n\)th individual is selected after the initial random selection.

Example 5.6 (Systematic sampling) For the study in Example 5.5, a sample of \(40\) students in a course of \(441\) is needed. To find a systematic random sample, select a random number between \(1\) and \(441/40\) (approximately \(11\)) as a starting point; suppose the random number selected is \(9\).

The first student selected is the \(9\)th person in the student list (which may be ordered alphabetically, by student ID, or other means). Thereafter, every \(441/40\)th person, or \(11\)th person, in the list is selected: people labelled as \(9\), \(20\), \(31\), \(42\),... (see animation below). The sample comprises \(23\) older students and \(17\) younger students.

Care needs to be taken when using systematic samples to ensure a pattern is not hidden. Consider taking a systematic sample of every \(10\)th residence on a long street. In many countries, odd numbers are usually on one side of the street, and even numbers usually on the other side. Selecting every \(10\)th house (for example) would include houses all on the same side of the street, and hence with similar exposure to the sun, traffic, etc.

5.7 Stratified sampling

In stratified sampling, the population is split into a small number of large (usually homogeneous) groups called strata, then cases are selected using a simple random sample from each stratum. Every individual in the population must be in one, and only one, stratum.

Example 5.7 (Stratified sampling) For the typing study in Example 5.5, \(20\) younger and \(20\) older students could be selected to obtain a sample of size \(40\). The sample is stratified by age group of the person (see animation below).

Assume that about \(67\)% of the students are younger in the population. To ensure that two-thirds of the sample of size \(40\) comprised younger students, \(27\) younger students would be selected in the sample (see animation below).

Similarly, the second animation below shows how a stratified random sample of size \(40\) might be selected, by randomly selecting \(27\) younger and \(13\) older students.

5.8 Cluster sampling

In cluster sampling, the population is split into a large number of small groups called clusters. Then, a simple random sample of clusters is selected, and every member of the chosen clusters become part of the sample. Every individual in the population must be in one, and only one, cluster.

Example 5.8 (Cluster sampling) For the study in Example 5.5, a simple random sample of (say) three of the many small-group classes for the course could be selected, and every student enrolled in those selected small groups constitute the sample (see animation below). Due to the classes chosen, the sample size is \(n = 43\) (\(30\) older; \(13\) younger).

5.9 Multistage sampling

In multistage sampling, larger collections of individuals are selected using a simple random sample, then smaller collections of individuals within those large collections are selected using a simple random sample. The simple random sampling continues for as many levels as necessary, until individuals are being selected (at random).

Example 5.9 (Multistage sampling) For the study in Example 5.5, a simple random sample of ten of the many small-group classes could be selected (Stage 1), and then four students are randomly selected from each of these \(10\) selected tutorials (Stage 2) (see animation below). The sample size is \(10\times 4 = 40\), comprising \(24\) older students and \(16\) younger students.

Example 5.10 (Multistage sampling) Multistage sampling is often used by national statistical agencies. For example, to obtain a multistage random sample from a country:

  • Stage 1: Randomly select some cities in the nation;
  • Stage 2: Randomly select some suburbs in these chosen cities;
  • Stage 3: Randomly select some streets in these chosen suburbs;
  • Stage 4: Randomly select some houses in these chosen streets.

This is cheaper than simple random sampling, as data collectors can be deployed in a small number of cities (only those chosen in Stage 1).

5.10 Representative sampling

Obtaining a truly random sample is usually hard or impossible. In practice, sometimes the best compromise is to select a sample diverse enough to be somewhat representative of the diversity in the population: where those in the sample are not likely to be different than those not in the sample (in any obvious way), at least for the variables of interest.

As always, the results from any non-random sample may not generalise to the intended population (but generalise to the population which the sample does represent).

Example 5.11 (Representative sample) Suppose we wish to evaluate the functionality of two types of hand prosthetics.

A randomly-chosen group of Alaskan and Texan residents is asked for their feedback, probably (but not certainly) their views would be similar to those of all Americans. No obvious reason exists for why residents of Alaska and Texas would be very different from residents in the rest of the United States, regarding their view of hand prosthetic functionality.

Even though the sample is not a random sample of all Americans, the results may generalise to all Americans (though we cannot be sure).

Example 5.12 (Non-representative samples) Suppose we wish to determine the average time per day that Americans households use their air-conditioners for cooling in summer.

If a group of Texas residents is asked, this sample would not be expected to represent all Americans: it would over-represent the average number of hours air-conditioners are used for cooling in summer. In this case, those in the sample are very different to those not in the sample, regarding their air-conditioners usage for cooling in summer.

In contrast, suppose a group of Alaskans was asked the same question. This sample would not represent all Americans either (it would under-represent). Again, those in the sample are likely to be very different to those not in the sample, regarding their air-conditioners usage for cooling in summer.

Sometimes, a combination of sampling methods is used.

Example 5.13 (A combination of sampling methods) In a study of pathogens present on magazines in doctors' surgeries in Melbourne, some suburbs can be selected at random, and then (within each suburb) all surgeries are contacted, and some surgeries volunteer to be part of the study.

In a study of diets of children at child-care centres, researchers used samples in 2010 and 2016, described as follows (N. Larson, Loth, and Nanney 2019, 336):

In 2010, a stratified random sampling procedure was used to select representative cross-sections of providers working in licensed center-based programs and licensed providers of family home-based care from publicly available lists. [...] Additional participants were also recruited in 2016 using a combination of stratified random and open, convenience-based sampling.

Sometimes, practicalities dictate how the sample is obtained, which may result in a non-random sample. Even so, the impact of using a non-random sample on the conclusions should be discussed (Chap. 9). Sometimes, ways exist to obtain a sample that is more likely to be representative.

Random samples are often difficult to obtain, and sometimes representative samples are the best that can be done. In a representative sample, those in the sample are not obviously different than those not in the sample. Try to ensure that a broad cross-section of the target population appears in the sample.

Example 5.14 (Representative sample) For the typing study in Example 5.5, only selecting students who are attending the gym, or only students who are at a certain Cafe, is unlikely to be somewhat representative of the whole student population.

Instead, the researchers could approach:

  • Students at the cafe on Monday at \(8\)am;
  • Students at the gym on Tuesday at \(11\):\(30\)am; and
  • Students entering the Library on Thursdays at \(2\)pm.

This is still not a random sample, but the sample now is likely to comprises a variety of student types. Ideally, students would not be included more than once in our sample, though this is often difficult to ensure.


Free Online Poll Maker

The researchers takes a random sample from each of the large groups (cases).

This is a stratified sample.

To determine if the sample is somewhat representative of the population, sometimes information about the sample and population can be compared. The researchers may then be able to make some comment about whether the sample seems reasonably representative. For example, the sex and age of a sample of university students may be recorded; if the proportion of females in the sample, and the average age of students in the sample, are similar to those of the whole university population, then the sample may be somewhat representative of the population (though we cannot be sure).

Example 5.15 (Comparing samples and populations) Egbue, Long, and Samaranayake (2017) studied of the adoption of electric vehicles (EVs) by Americans, using a sample of \(121\) people found through social media (such as Facebook) and professional engineering channels. This is not a random sample.

The authors compared some characteristics of the sample with the American population from the 2010 census. Compared to the US population, the sample contained a higher percentage of males, a higher percentage of people aged \(18\)--\(44\), and a higher percentage of wealthy individuals.

5.11 Sampling bias

The sample may not be representative of the population for many reasons, all of which compromise how well the sample represents the population (i.e., compromises external validity and accuracy). This is called selection bias.

Definition 5.7 (Selection bias) Selection bias is the tendency of a sample to over- or under-estimate a population quantity.

Selection bias is less common in studies with forward directionality, compared to studies that are non-directional or have backward directionality (Sect. 3.7). Selection bias may occur if the wrong sampling frame is used, or non-random sampling is used. The sample is biased because those in the sample may be different than those not in the sample (and this may not always be obvious). Biased samples are less likely to produce externally valid studies.

Example 5.16 (Selection bias) Consider Example 5.12, about estimating the average time per day that air conditioners are used for cooling in summer. Using people only from Alaska in the sample is using the wrong sampling frame: the sampling frame does not represent the target population ('Americans'). This is selection bias.

Non-response bias occurs when chosen participants do not respond. The problem is that those who do not respond may be different than those who do respond. Non-response bias can occur because of a poorly-designed survey, using voluntary-response sampling, chosen participants refusing to participate, participants forgetting to return completed surveys, etc.

Example 5.17 (Non-response bias) Consider a study to determine the average number of hours of overtime worked by various professions. People who work a large amount of overtime may be too busy to answer the survey. Those who answer the survey may be likely to work less overtime than those who do not answer the survey. This is an (extreme) example of non-response bias.

Response bias occurs when participants provide incorrect information: the answers provided by the participants may not reflect the truth. This may be intentional (for example, when respondents lie) or non-intentional (for example, if the question is poorly written, is personal, or is misunderstood).

Consider using these samples:

  1. Obtaining data using a telephone survey.
  2. Obtaining data using a TV stations call-in at about \(6\):\(15\)pm.
  3. Sampling students at your university, because it is easier than finding a random sample of all people in your country.

For each of the above samples, give an example of an outcome for which the sample would likely give over-estimate of the population value.

There are many correct answers; here are some:

  1. The percentage of people that own a telephone.
  2. The percentage of people that are shift-workers
  3. The percentage of people studying at university.

5.12 Chapter summary

Almost always, the entire population of interest cannot be studied, so a sample (a subset of the population) must be studied. Many samples are possible; we only study one sample. Samples can be random or non-random. Conclusions made from random samples can usually be generalized to the population (that is, they are externally valid and accurate).

Random sampling methods include simple random samples, systematic samples, stratified samples, cluster samples, and multi-stage samples. Random samples are likely to be externally valid and accurate.

Non-random sampling methods include convenience samples, judgement samples, and self-selecting samples. Random samples are often very difficult to obtain, so the best we can do is to aim for reasonably representative samples, where those in the sample are unlikely to be very different than those not in the sample. Non-random samples may not be externally valid or accurate.

The following video may be helpful.

5.13 Quick review questions

  1. Suppose students are randomly selected and sent postal surveys from their university, but some students have moved and so never receive the survey. What type of bias will this result in?
  2. A large sample is always better than a random sample: True or false?
  3. Select all the sampling methods that are random sampling methods.
  1. Judgement sampling
  2. Stratified sampling
  3. Simple random sampling
  4. Voluntary sampling
  5. Cluster sampling
  6. Multi-stage sampling
  7. Self-selected sampling

5.14 Exercises

Answers to odd-numbered exercises are available in App. E.

Exercise 5.1

What is the main advantage of using a random sample?

' a. It is easier. b. It is more likely to produce an experimental study. c. It is more likely to produce an externally-valid study. d. It is more likely to produce precise estimates.'

Exercise 5.2

What is the main advantage of using a large sample?

' a. It is easier. b. It is more likely to produce an experimental study. c. It is more likely to produce an externally-valid study. d. It is more likely to produce precise estimates.'

Exercise 5.3 A researcher has three months in which to collected the data for a study on car park usage. Suppose the researcher wants to take a systematic sample of days, and on each of the selected days records the number of cars in the car park.

To select the days in which to collect data, she decides (by using random numbers) to start data collection on a Tuesday, and then every \(7\)th day thereafter.

  1. What problem is evident in this sampling scheme?
  2. What suggestions would you give to improve the sampling?

Exercise 5.4 Suppose you need to estimate the average number of pages in a book in a university library (with five campuses), using a sample of \(200\) books. Describe how to select a sample of books using:

  1. a simple random sample of books.
  2. a stratified sample of books.
  3. a cluster sample of books.

 

  1. a convenience sample of books.
  2. a multi-stage sample of books.

Which sampling scheme would be most practical?

Exercise 5.5 Suppose you need a sample of residents from apartments in a large residential complex, comprising \(30\) floors with \(15\) apartments on each floor. You plan to survey the residents of these apartments. For each of the possible sampling schemes given below, first describe the sampling scheme, and then determine which methods are likely to give random (or representative) sample (explaining your answers).

The four possible sampling schemes are:

  1. Randomly select five floors, then randomly select four apartments from each of those five floors, and interview the oldest resident of that apartment.
  2. Randomly select one floor, and select the \(15\) apartments on that floor, then interview the oldest resident of that apartment.
  3. Wait at the ground-floor elevator, and ask people who emerge to complete the survey.
  4. Randomly select five floors, then wait by the elevator on those floors and survey residents as they arrive at the elevator.

Exercise 5.6 Suppose a researcher needs a sample of customers at a large, local shopping centre to complete a questionnaire. Four sampling schemes are listed below.

For each, describe the type of sampling. Then, determine which would be the best method (explain why), and determine which (if any) produce a random sample.

The four possible sampling schemes are:

  1. The researcher locates themselves outside the supermarket at the shopping centre one morning, and approaches every \(10\)th person who walks past.
  2. The researcher waits at the main entrance for \(30\) minutes at \(8\)am every morning for a week, and approaches every \(5\)th person.
  3. The researcher leaves a pile of survey forms at an unattended booth in the shopping centre, and a locked barrel in which to place completed surveys.
  4. The researcher goes to the shopping centre every day for two weeks, at a different time and location each day, and approaches someone every \(15\) mins.

Exercise 5.7 A study (Ridgewell, Sipe, and Buchanan 2009) investigated how children in Brisbane travel to state schools. Researchers randomly sampled four schools from a list of all Brisbane state schools, and invited every family at each of those four schools to complete a survey.

What type of sampling method is this? How could the researchers determine if the resulting sample was approximately representative?

Exercise 5.8 A study comparing two new malaria vaccines recruited \(200\) Kenyans who had contracted malaria. These recruits were obtained by approaching all patients with a confirmed malaria diagnosis who were admitted to hospitals. Patients could volunteer for the study or not. The study was then conducted to a high standard. Which of the following statements are true?

  1. This is a voluntary response sample.
  2. The study is likely to have high external validity.
  3. The sample size is too small for the study results to provide useful information.

Exercise 5.9 Suppose a natural forest region is classified into two quite different zones. Zone A is mostly dunes and lightly vegetated, and on the coastal side of a ridge; Zone B is more densely vegetated and on the inland side of the ridge.

A random sample of sugar ants (Camponotus app) is taken from Zone A, and another random sample of sugar ants from Zone B, to study the average size of the ants. What is the best description of the type of sampling method being used?

Exercise 5.10 One (actual) survey in 2001 concluded (Hieger (2001), cited in Bock, Velleman, and De Veaux (2010), p. 283):

All but \(2\)% of the home buyers have at least one computer at home, and \(62\)% have two or more. Of those with a computer, \(99\)% are connected to the internet.

The article later reveals the survey was conducted online (recall the survey was conducted in 2001). The target population is home buyers; however, home buyers with internet access were far more likely to complete the survey than home owners without internet access.

What type of bias is this?

Exercise 5.11 Researchers are studying the percentage of farms that use a specific management technique. The researchers randomly select \(20\) regions around the country, then select farms within each region by asking for farmers to volunteer to be in the study.

Explain why this is not a multistage sample, and what changes are necessary for the researchers to have a multistage sample.

Exercise 5.12 Researchers are comparing the average time that experienced and first-year school teachers spend in the sun. The researchers select some schools by asking school principals to volunteer their schools, then record information for every teacher in those schools.

Explain why this is not a cluster sample, and what changes are necessary for the researchers to have a cluster sample.