4 Week 2 lecture: Study design
4.1 Introduction to the study by Fan, Tybur and Jones (2022)
This week we will looking at the data from a recent study (Fan, Tybur, and Jones 2022). The study looked at how comfortable people would be doing seven different behaviours that could spread infection (e.g. sharing a water bottle, sitting next to someone, using their towel) with a stranger whose face they had seen on the screen. The participants were either from the UK or from China; the face that they were looking at was either Chinese or British; and the face that they were looking at was either normal; the image had been manipulated to make it look like they had an infection; or the image had been manipulated to make it look like they were wearing a mask, as shown in the figure below.
For simplicity, we are just going to analyse the data from the UK participants, not those from China (if you look on the data repository of the paper, https://osf.io/t476f/, you will find two datafiles, one called GB and one called CH. The latter is the data from China, and we are looking only at the GB file here. (Side note: the UK and GB are not quite the same. Northern Ireland is in the United Kingdom, but not in Great Britain. The authors of this study were clearly a bit vague about whether their study population was the UK or GB, since they say UK in the paper but call the data file GB!)
4.2 Experimental design
When faced with an experiment whose data you want to analyse, you first want to think about what its design is. The data from an experiment consists of a number of variables. A variable is simply something that can take different values within the experiment. It is good practice to give each variable an unambiguous name by which you refer to it consistently.
For example, whether the displayed face is British or Chinese is a variable. We will refer to this as Agent Nationality. I don’t love the name, but it is what the researchers called it in their data. Specifically, Agent Nationality is a categorical variable. This means that its value must be one of a limited number of discrete options (in this case, just two options, either ‘British’ or ‘Chinese’).
Another variable in the experiment is how comfortable the respondent would be having the various forms of contact with the person whose face is shown. We will call this Contact Comfort. Contact Comfort is calculated by measuring the person’s degree of comfort with each of the seven behaviours (sharing a water bottle, sitting next to someone, using their towel….) on 7-point scales from ‘1 - very uncomfortable’ to ‘7 - very comfortable’. The average of the seven responses is then calculated. This is what is known as a Likert scale, after psychologist Rensis Likert. Although each of the individual items is measured on a scale with only a small number of response possibilities (7), the average of the seven behaviours can of course take many more values. Thus, we treat the average score as a continuous variable, that is a variable akin to height, weight, speed or distance that can in principle take any value within a range of the line of real numbers.
When examining the design of an experiment, identifying which variables are categorical and which are continuous is one important thing to do. The other is to divide the variables into independent variables, dependent variables, and covariates.
###Independent variables### Independent variables (IVs) are those that are manipulated by the experimenter. The different possible values of the variable are then usually randomly assigned to participants. For example, in a vaccine trial, the independent variable is whether the patient gets the real vaccine (treatment group) or just saline solution (control group). Usually, the goal of an experiment is to discover whether the independent variable has an effect on some outcome (for example, does having the vaccine affect the likelihood of contracting a disease). The researcher manipulates the independent variable exactly to identify whether the outcome is different on average, according to which value of the independent variable has been given.
In this experiment, there are two independent variables. First there is the Agent Nationality, as discussed above. Second, there is Manipulation Condition, namely whether the face is unmanipulated, masked or made to look infected. So, there are two independent variables, Agent Nationality with two possible values, and Manipulation Condition with three possible values. All of the possible combinations of the value of these two variables are generated in the course of the experiment. That is, the participants could have six different experiences: British unmanipulated face, British infectious, British masked, Chinese unnmanipulated, Chinese infectious, Chinese masked. We call this a full factorial design, i.e. one where all of the combinations of all the independent variables are presented. In particular, this experiment is a 2 x 3 factorial design, because the independent variables take 2 and 3 levels respectively, and the total number of possible trial types is 2*3 = 6
.
We are getting ahead of ourselves a little just here, but if we want to work out the effect of one of the independent variables on the outcome, we have to average across the values of the other one. For example, to work out the effect on contact comfort of Agent Nationality, we are going to want to average the contact comfort across the unmanipulated faces, the infectious faces, and the masked faces. Along as Chinese and British faces are equally likely to be shown across all the levels of Manipulation Condition, this will be fine.
###Dependent variables###
Dependent variables (DVs) are variables measuring the outcome(s) of interest. Calling them dependent kind of begs the question; maybe they depend on the independent variable, but maybe they don’t. They should be called ‘potentially dependent’ variables. For example, whether you get the disease potentially depends on whether you had the vaccine; it only actually depends on it if the vaccine is effective. Otherwise the vaccine makes no difference. If the dependent variables do depend on the independent variables, their average will be different in the groups constituted by the different values of the independent variable. In this case you have an experimental effect (in medical trials, a treatment effect). If they don’t, you have a null result or null effect. The average of the dependent variables will be more or less the same across the different groups formed by the values of the independent variables.
Here, the dependent variable is Contact Comfort. There are other related variables in the full paper, but here we will focus only on the Contact Comfort variable.
###Covariates###
A third type of variable in experimental studies is the covariate. A covariate is a variable that is not the outcome, but which the experimenter was not able to manipulate either. For example, the sex of the participant, their personality profile or whether they write with their left hand, might be covariates. The experimenter can measure these things ahead of time, but cannot manipulate them.
Sometimes covariates are just ‘nuisance’ variables, things that can affect the dependent variable but which have nothing to do with the study question. For example, women might be less comfortable than men sharing water bottles with strangers, but this could be irrelevant to the research question that the researchers have in mind. On the other hand, the researcher might actually be interested in the sex difference. Or, the sexes might differ in their response to the infection manipulation. In this case, sex would be described as an effect modifier or moderator.
Sometimes, covariates are