1.5 Sampling techinques

  • The idea of sampling is to select a portion (or subset) of the larger population and study that portion (the sample) to gain information about the population

  • Because it takes a lot of time and money to examine an entire population, sampling is a very practical technique

Simple random sampling

  • In a simple random sample (SRS), each unit in the population is equally likely to be chosen

  • Most common approach is to assign a number to each population unit and then randomly choose numbers that correspond to these units by random number generator (RNG)

  • Excel function =RANDBETWEEN() generates integer numbers that are randomly chosen between any two values you specify. For example, if you would like to choose a sample \(n=20\) from the population \(N=100\) you should repeat the function =RANDBETWEEN(1;100) twenty times until \(20\) unique numbers are generated (results with te same numbers should be discard when sampling without replacement)

Systematic sampling

  • Instead of RNG you use a specific sequence of numbers, i.e. every \(k^{th}\) population unit should be selected from the list starting with some randomly chosen number between \(1\) and \(k\), where \(k=\frac{n}{N}\)

Stratified sampling

  • When using stratified samples a population is divided into groups that we call strata

  • The strata may reflect any of characteristics of the population units, such as age, nationality, level of education, and so on

  • A stratified sample is chosen by ensuring that the proportion of sample units in each stratum matches the distribution found in the population

Cluster sampling

  • Cluster sampling is a two step procedure. It also considers dividing an entire population into groups but in the first step groups are chosen randomly, while in the second step a units from those groups are randomly chosen using a simple random sampling

  • The advantage of using cluster sampling is that it can be implemented more quickly and cheaply than stratified sampling

  • Unlike random samples, a non-random samples provide results which are not reliable for drawing conclusions about the overall population. Non-random sampling does not guarantee that each population member has a chance of being chosen and conclusions are primarily based on subjective judgments

  • Most common non-random sampling is convenience sampling, e.g. when you choose population units primarily because they are accessible

Example 1.7 A study is done to find the average private instructions that undergraduate students pay per semester. Each student in the following samples is asked how much private instructions he or she paid for the Fall semester. What is the type of sampling in each case?

  1. sample of 75 undergraduate students is taken by organizing the students’ names into groups (freshman, junior, or senior), and then selecting 25 students from each group

  2. random number generator is used to select a student from the alphabetical listing of all undergraduate students in the Fall semester. Starting with that student, every 50th student is chosen until 75 students are included in the sample

  3. completely random method is used to select 75 students. Each undergraduate student in the fall semester has the same probability of being chosen at any stage of the sampling process

  4. random number generator is used to pick two of three groups of students (freshman, junior, and senior). All students in those two groups are in the sample

  5. administrative assistant is asked to stand in front of the library one Wednesday and to ask the first 75 undergraduate students he encounters what they paid for private instructions for the Fall semester

Example 1.8 Considering example 1.7 which sample is non-random?

Example 1.9 A teacher wants to know if his students are doing homework, so he randomly selects rows two and five and then calls all students in row two and all students in row five to present the solutions to homework problems to the class. What sampling method is used?

Example 1.10 The marketing manager for an electronics chain store wants information about the ages of its customers. Over the next two weeks, at each store location, 100 randomly selected customers are given questionnaires to fill out asking for information about age, as well as about other variables of interest. What sampling method is used?