7 Introduction to Experiment Design

In a perfect world, experiment design starts with an open mind and a question someone wants to investigate. For example, someone might ask: “does this medicine work” rather than task you “prove that this medicine works”. You would then, in collaboration with other researchers, design the experiment, collect the data, examine the data, and come to a conclusion. In reality, more often than not, the data already exists and you, as the data analyst, are told to “do something” with it. Keep that in mind, as we go through this intro to experiment design.
There are numerous considerations to keep in mind when designing an experiment. Some are obvious, others require some thought.

7.1 Who pays?

In an ideal world, you are independently researching a topic of interest to you, free from any constraints and obligations. You are searching for the truth, trying to find an unbiased answer to an unbiased question. In reality though, someone will pay for your work, and that someone may have an agenda. Unfortunately, you have to consider this from the start when setting up your research.
Ex.: You have heard about drilling in ANWR, and you are supposed to investigate its impact on wildlife. Depending on who you work for, you may ask different questions:

Neutral: “Describe the impact of drilling on wildlife.”
Pro drilling: “Show there is minimal impact of drilling on wildlife.”
Against drilling: “Show there is negative impact of drilling on wildlife.”

So remember, ask yourself “Who pays”? The answer influences how much money, time and manpower you may have available. A researcher may face pressure to produce the “right” results. Companies, government agencies, and even non-profits are not likely to sponsor and publish research that goes against their interests. See for example https://www.businessinsider.com/argentina-inflation-number-prosectutions-2013-1

7.2 Define your question in detail

Exactly what is the question/ topic you are trying to investigate? Be as precise as possible. It is important to define exactly what you will measure, what type of data you propose to collect. If you are working for someone else, make sure you understand completely what is expected of you. Make sure the other party and you agree on what you can and cannot deliver.
Ex.: You may start with a hypothesis like “road salt harms plants”. This seems like a simple theory, but you will also have to define:

What is “road salt”? Road salt can be plain salt, a salt and sand mix, or even a liquid.
Do you plan to sprinkle salt on your plants or on the dirt or do you use salt water to simulate run-off? How much salt? How often? For how long? Before or after plants emerge?
What does “harm” mean? Will you measure growth height, number of leaves, color of the plants, mass of the plants, delayed emergence in spring, earlier die-off in fall, susceptibility to other stresses such as drought, number of viable seeds, plant diversity?
What is a “plant”? Do you consider anything growing near a road, or also aquatic species? Do you make a distinction between native and invasive plants? What about crops and garden plants?

7.3 Determine your population

The population is a set of similar items, people, or events which is of interest for your question or experiment. For example, if you want to know the average household income in Westfield, then the population is the set of all households in Westfield. If you want to study the average age of cars on the road in the US, then the population would be all cars on the road in the US.

7.4 Do a literature search

Chances are you do not know everything about the subject you are going to study. You might get helpful hints and ideas from other peoples’ work and avoid “re-inventing the wheel”. Also, it may be helpful to build on and learn from existing studies. This could save you time and money (and it also makes you look educated). The library and internet are great places to start. Our research librarians are happy to help.

7.5 Types of studies.

Depending on time and money constraints, you may choose from different types of studies. Ideally, you will collaborate with experts in the field the study is conducted in to do so.

7.5.1 Experiments

In an experiment, you apply some treatment to your test subjects and then record any changes or effects that treatment might have. Some important elements of good experiment design are

You selected a meaningful, representative sample
You are able to distinguish the effects of different factors
Your results can be re-produced by others.
Ex.: You split a group of test subjects randomly into two. You administer a drug to one group, a placebo to the other. What is the effect of the placebo?

7.5.2 Observational Study

You observe and measure some characteristics of your test subjects, but you try not to modify or influence your subjects at all.
Ex.: Observe wolves in the wild.
There are three major types of observational studies:

For a cross-sectional study, data are collected at one point in time.
A retrospective study uses data from the past, collected from records, through interviews, etc.
In a prospective study, data are collected in the future from groups sharing common factors.
Ex.: Collecting data from new freshmen across the country

Whatever you do, results need to be reproducible!! This means that someone else should be able to replicate your study with very similar results.

7.6 Sampling

Sample selection is the choice of subjects to include in your study. It is important that your sample is representative of the population (remember: population – everyone or everything relevant to your study).

7.6.1 Sample types

A simple random sample is a sample where every set of items has the same probability of being chosen.Think pulling numbers out of a hat.
a convenience sample simple consists of those subjects that are easy to get, for example your friends or classmates. Convenience samples are easy to find, but may not be representative.
To find a self-selected sample or voluntary response sample simply ask for volunteers. You can be sure that your subjects want to participate, but again your sample may not be representative of the whole population.
Ex.: Giving out free samples at the mall and asking for peoples’ opinion. Ex.: Radio call-in show or internet poll. People have a choice whether they want to voice their opinions or not.
A systematic sample uses some starting point and then selects every kth (for example every 10th, 50th, 100th) element.
Ex.: Use a phone book and call every 20th person, starting with the first entry. You trust that your phone book is random enough, so that every 20th person isn’t too similar. (Not all are middle aged, republican, middle income, 1 child, divorced…)
Ex.: Quality control on every 100th item produced
A cluster sample first divides the population into clusters. Complete clusters are chosen and everyone in that cluster in included.
Ex.: Clusters could be each math class. We randomly choose a few math classes and include all students in those classes.
Ex.: Clusters might be the different cities in the US. We randomly choose a few cities and include all people in those cities.
For stratified sampling, you first decide which characteristics might be important. For example, you might want to include some people from each ethnic background, or some people from each age group. After you define these groups (also called strata), you draw a sample from each group.

Note: In practice, sampling methods are mixed. One might use stratified sampling and choose a few cities, but then continue with a self-selected sample. Test marketing uses this approach. A new movie is first shown in a few selected cities (cluster sampling), but people have the choice to go see the movie or not (voluntary response).

7.6.2 Sample size

This will be covered in a future class. Keep in mind the following rule of thumb (aka the Law of Large Numbers): As long as your sample is representative and well collected, and as long as your experiment is well designed, larger samples give better results. Go as large (within reason) as you can afford and handle.

7.6.3 Controlling the effects of different factors

Sometimes it is not possible to distinguish between the effect of different variables (Did the patient get better because of medication, different doctors, or just time). When different factors may be responsible for the effect you see, and you cannot tell which ones are, you have confounding. These are the main strategies to avoid confounding:

Blinding and double blinding
Sometimes, patients will get better because of the placebo-effect. A placebo is a “sugar pill”, it contains no medication. Patients are under the impression that they received treatment, and some will experience real or imagined improvement in symptoms. To avoid the placebo effect, blinding is used. Some patients receive the treatment, others a placebo. Patients do not know which they received. One can then study differences between the two groups. To avoid bias on the part of the treating physicians, sometimes double blinding is used. Neither the patient receiving treatment nor the doctor giving treatment know if they are using a placebo or not.
When designing an experiment, you will commonly choose one or more groups (also called blocks) that receive treatment and one control group. It is important that these groups are similar in size and make-up with respect to those factors that might affect the outcome of your experiment.
Completely randomized experiment design
After selecting your subjects, each subject is assigned to a group completely randomly. Think of this as randomly pulling names out of a hat. Each subject is equally likely to end up in any of the groups.
Rigorously controlled design
Subjects are carefully assigned to a group (block) in such a manner that each group is similar with respect to the variables that might affect the experiment. You might, for example, make sure each block contains the same proportions of males and females as the others, a similar age distribution, or similar race composition.

7.6.4 Error sources

We consider two main error sources.

Sampling errors.
Because you are dealing with a sample, your results may differ from the behavior of the population. For example, you may predict 49% votes for party “A” based on a sample of 1923 people, while the actual result is 47%. This is called the sampling error, caused because you used a sample. Sampling errors happen, they are not your fault (just your bad luck).
Non-sampling errors.
These are caused by human error such as poor sample selection, mistakes in data recording, calculations, test choice, result interpretation, etc. Non-sampling errors are your fault.

7.6.5 Data types

Make sure you correctly identify the type of data collected. This becomes important when deciding in what format to store the data, and what calculations you can do with your data.

categorical, qualitative, attribute: categories only, not (meaningful) numbers.
quantitative, numerical: numbers representing counts or measurements
Careful: Assigning numbers to qualitative data doesn’t turn that data into numerical data. For example, setting “yes” to 1 and “no” to 0 does not turn yes/no into numerical (qualitative) data.
discrete or continuous data: If your data is discrete, that means that you could make a (possibly never ending) list of all possible values. Think difference between countable and uncountable. Often times, continuous data are reported on a discrete scale. For example, a person’s height is usually given in increments of 1 inch (5 ft 7 in). In reality, a person’s height is continuous, any fraction of an in is also possible. Another example would be guessing an integer from 1 to 10 (discrete, possible values are 1, 2, 3, 4, 5, 6, 7, 8, 9, 10), or guessing any real number from 1 to 10.

7.6.6 Levels of measurement (lom)

Levels of measurement further categorize your data.

nominal lom: categories only, no order, example: colors
ordinal lom: categories, order exists, example: days of the week, letter grades
interval lom: numbers, no natural zero, multiples don’t make sense, example: temperatures, calendar years
ratio lom: numbers, natural zero, multiples make sense, example: age, score on exam Stating your results

7.7 Clean your data

Mistakes happen. If a value looks extreme, check and make sure it is correct. Check all outliers. As a rule of thumb, about 80% of your time will be spent cleaning the data, only about 20% interpreting it. Yes, it is that time consuming and painful. We will devote a whole chapter on data cleaning.

7.8 Exploratory data analysis

Once you have your data collected and cleaned, you need to describe what you actually have, find out what information you have and summarize it in a meaningful way. This includes graphs, summary statistics such as mean and variance, tables, etc. We will address this in a future chapter.

7.9 Inferential data analysis

Finally, you are ready to interpret your data. This includes all testing, making conclusions and decisions, and drawing inferences. The goal of data analysis is to make an inference about some characteristic of the population based on the sample. This is the most interesting part of data analysis mathematically speaking, and what most of this book is about.

7.10 Report your results

While you need to include your calculations in any report or paper you may produce, you also need to state your results in plain English. Even a person with limited statistics background should be able to understand your conclusions.

Ex.: Imagine your boss asks you to look at some samples and find out if the new machine is more consistent than the old one. If your answer is “With a p-value of 1.2E-3 I reject H0, where the null hypothesis is that the new machine has lower standard deviation”, your boss is probably not going to be happy. Phrase your answer in the context of the problem: “I am 99% sure that the new machine is no better than the old one”.

7.11 Assignment

Design a survey to predict the next election (skip the literature search, data collection and cleaning, and analysis). Use as much of the vocabulary from this chapter as you can. Hand in a rendered RMD file.

6 Brief Summary of elementary statistics

8 Descriptive stats with R and ggplot2