5.10 Bias in selecting samples

The sample may not be representative of the population for many reasons, all of which compromise how well the sample represents the population (i.e., compromises external validity). This is called bias. Biased samples are less likely to produce externally valid studies.

Definition 5.6 (Bias) Bias is the tendency of a sample to over- or under-estimate a population quantity.

More formally:

…bias is the introduction of systematic error, subconsciously or otherwise, in the design, data collection, data analysis, or publication of a study.

— Sedgwick (2014)

In selection bias, the wrong sampling frame may be used, or non-random sampling is used. The sample is biased because those in the sample may be different than those not in the sample.

Example 5.15 (Selection bias) Consider Example 5.11, about estimating the average time per day that air conditioners are used for cooling in summer.

Using people only from Queensland and the Northern Territory in the sample is using the wrong sampling frame: the sampling frame does not represent the target population (‘Australians’). This is selection bias.

Non-response bias occurs when chosen participants do not respond for some reason. The problem is that the responses from those who do not respond may be different than the responses who do respond. Non-response bias can occur because of a poorly-designed survey, using voluntary-response sampling, chosen participants refusing to participate, participants forgetting to return completed surveys, etc.

Example 5.16 (Non-response bias) Consider a study to determine the average number of hours of overtime worked by various professions. People who work a large amount of overtime are likely to be too busy to answer the survey.

Those who answer the survey may be likely to work less overtime than those who do not answer the survey. This is an example of non-response bias.

Response bias occurs when participants provide incorrect information: the answers provided by the participants may not reflect the truth. This may be intentional (for example, if the survey questions are very personal or controversial in nature) or non-intentional (for example, if the question is poorly written or is misunderstood).

Think 5.4 (Sampling) One (true) survey concluded (Hieger (2001), cited in Bock et al. (2010), p. 283):

All but 2% of the home buyers have at least one computer at home, and 62% have two or more. Of those with a computer, 99% are connected to the internet.

The article later reveals the survey was conducted on-line (and recall the survey was done in 2001…). What type of bias is apparent?

Think 5.5 (Bias) For these samples, to what populations will results generalise?

Obtaining data using a telephone survey.
Obtaining data using a TV stations call-in.
Asking your friends to participate because it is easier than finding a random sample.

For each of the above samples, give an example of an outcome which would be likely to over-estimate the true (population) value.

References

Bock DE, Velleman PF, De Veaux RD. Stats: Modeling the world [Internet]. Addison-Wesley; 2010. Available from: https://books.google.com.au/books?id = zHEJPwAACAAJ.

Hieger J. Portrait of a homebuyer household: 2 kids and a PC. Orange County Register; 2001.

Sedgwick P. Non-response bias versus response bias. BMJ. British Medical Journal Publishing Group; 2014;348:g2573.