3 Week 2 lecture: Study design
3.1 Phenomena, constructs and variables
In science, we are generally interested in describing and explaining phenomena. Phenomena are observable events or facts. For example, a person might quit their job, or might get married, or might contract an illness. Or, they might become much better or much worse at a task. In general, phenomena can be observed somewhat directly.
Cognitive science is not just concerned with phenomena. It also deals with psychological constructs. Constructs are not in themselves directly observable, but are used to describe and categorize patterns of cognition or behaviour. Common constructs in cognitive science are things like intelligence, self-esteem, personality, and others. Consider the example of intelligence. The actual phenomena that we can observe are solving problems quickly, correctly doing difficult mental rotation tasks, or being effective in certain kinds of jobs. We invoke the construct of intelligence as a way of summarizing this manifold of phenomena, but we never directly see the intelligence itself, just the phenomena assumed to reflect it.
To do quantitative science, phenomena and constructs have to be turned into variables. A variable is, roughly, something that can be entered into a column of a spreadsheet. The variable can take on several or many different values, and these are measured or captured some number of times in the study. The step of turning the phenomena and constructs of interest into variables that we can measure is known as operationalization. This is a very difficult and important stage in cognitive science. Even if we want to study a simple behavior like losing one’s phone, we have to have criteria for when this event has occurred. How long does the phone have to be lost for? Is it enough that the person cannot say where their phone is when challenged (or says the wrong location), or does the person have to actively look for their phone and fail to find it? The question of operationalization becomes much more complex, obviously, if we want to study something subtle such as moral licensing or self-deception. We have to think very hard about what measures we are going to collect, and how we are going to extract a variable from those measures that (we hope) captures the construct about which we want to ask a question.
3.2 Types of variable
We can divide up variables in a number of ways. First is the distinction between categorical variables (sometimes also called qualitative variables) and quantitative (also called numerical) variables. There is also an intermediate class called ordinal variables.
3.2.1 Categorical variables
A categorical variable takes a finite set of possible values that cannot be ordered from smaller to larger. For example, the variable ‘marital status’ could take on the values {single, married, separated, divorced, widowed}. There is no sense in which widowed is greater than separated, which is greater than married; you could put the values in any order. Another example of a categorical variable would be, in a vaccine trial, whether you had had the actual vaccine or a saline solution (in other words, whether you were in the treatment group or the control group). We sometimes speak of variables having a number of levels, which means possible values. So, in the vaccine trial, group
would be a variable with two levels: treatment
, and control
.
Slightly confusingly, in data analysis we sometimes represent categorical variables with numbers. For example, in statistical analyses, you might use the value 1
for single
, 2
for married
, 3
for separated
and so on, as a way of representing the variable marital status
. The variable marital status
is still categorical if we do this, though, because we recognize that the numerical coding is arbitrary. We could equally well have chosen the value 1
for married and 2
for widowed; we are not really saying that separated
is bigger than married
.
Although the coding of categorical variables is ultimately arbitrary, there are better and worse choices of coding, in terms of interpreting our findings. For example, if your research question is about how divorced people are different from married ones, and married people are more common in your data, you might choose to represent married
as 0
and divorced
as 1
. This means certain statistical parameters will be very easily interpretable as the extent to which divorced people (the rarer state of interest) differ from married ones (who you are implicitly treating as a kind of baseline). We will see examples of this later on.
3.2.2 Quantitative variables
Quantitative variables are variables that: (i) are represented by values that can be ordered from less to more; and (ii) the gap between successive values on the scale has a constant meaning. For example, number of children is a quantitative variable. Two children is more than one child. Moreover, the difference between 2 children and 3 children means the same thing as the difference between 9 children and 8 children. It means one additional child in both cases. Likewise, distance is a quantitative variable. 114km is further than 112km, and the difference between 114km and 112km means the same thing as the difference between 9km and 7km; it means 2km further.
Some quantitative variables are discrete. This means that they can only take on integer values. Counts of things are generally discrete. I can either have 2 children or 3 children, but not 2.17 children.
Continuous quantitative variables can take on an in-principle infinite number of different values. Examples of continuous variables are distances, speeds, latencies, forces, heights, weights, and so on. The number of possible values is only infinite in principle, not in practice, because our equipment will always have a precision limit. Perhaps our clock only captures latencies to the nearest second. Nonetheless the variable is continuous because there are still a large number of recordable values the variable will take on. If our clock was so crude that we could only distinguish, say, events that had taken less than a minute from those that had taken more than a minute, then we would perhaps choose to treat latency as a categorical variable with the possible values {slow, fast}, rather than a continuous one.
3.2.3 Ordinal variables
Ordinal variables are variables that have some features of qualitative variables and some features of quantitative ones. An example is the following: highest education qualification
, out of the possible set {high school, 2 year college, undergraduate degree, masters, PhD}
. This is like a quantitative variable in that we can place the possible values in an order from less education to more education. The order of the levels is therefore not arbitrary. On the other hand, the difference between the levels does not have a constant meaning. The difference between high school and 2 year college might not be the same, in terms of amount of extra education, as the difference between masters and PhD. PhDs take more than 2 years for one thing.
There is a class of statistical methods for ordinal variables. We will not deal with them in this course but it is possible you will need to investigate them for your research. In practice in cognitive science, ordinal variables are often either converted into two-category ones (like Agree
versus Not
or Depressed
versus Not depressed
), or else treated as continuous variables. In particular, you will often meet measures using a so-called Likert scale, after psychologist Rensius Likert. In a Likert scale, there are multiple questions or items measuring the same construct. Each item produces an ordinal variable, with discrete levels Strongly disagree/Disagree/Neutral/Agree/Strongly Agree
or something similar. However, when you average together the many items to produce a single scale score, the resultinhh variable can take on many values, and is treated as quantitative and continuous.
3.2.4 Measured versus manipulated variables
It is worth mentioning another distinction between types of variables. This is the distinction between measured and manipulated variables. With measured variables, the researcher simply observes the value that is there. With manipulated variables, the researcher intervenes in the world to make the variable have the value that it has.
Imagine we are studying physical exercise and depression. We could take a population of people (university students, say), and ask them how many times per week they exercise. We could relate this variable to how many depressive symptoms they have. This would be an example of an observational study, as we will see below. Physical exercise in this study would be a measured variable; the people were exercising a certain amount anyway, and we measured how much that was.
On the other hand, we could take volunteers and request half of them to exercise four times a week, and the other half not to exercise. Thus, it would be our doing, not theirs, that certain among them end up exercising more than others. In this case, physical exercise would be a manipulated variable, not a measured variable. Our study would be an experimental study, or randomized control trial, as we will see.
When you encounter or design a research study, it is a useful exercise to list all the variables in the study, and classify each one as qualitative, ordinal or quantitative; and measured or manipulated.
3.3 Types of study
Having dealt with the main types of variable that we can have, it is time to think about the main types of study we can perform. The most important distinction is between observational studies and experimental ones. There is also an intermediate type called quasi-experimental. The terms ‘experiment’ and ‘experimental’ is sometimes used loosely in cognitive science. Don’t do this. A lot of research in cognitive science is not really experimental. Reserve the term for true experiments. The true experiment is a very distinctive tool in the armory of science, and a central one in cognitive science. It is important to be clear about which evidence is really experimental and which is not.
3.3.1 Observational studies
Observational studies are studies that contain only measured variables. They are sometimes also known as correlational studies. The researcher collects data in some study population about phenomena or constructs that interest them, and establishes how those phenomena or constructs relate to one another. Examples might be studies of personality and job satisfaction; or age and working memory; or gender and spatial navigation.
Observational studies are very important. They are the only practical way of studying some topics. Observational studies can establish associations between variables. They cannot in general identify causal effects, or not umambiguously. Consider for example an observational study of income and life satisfaction. We find that people with higher incomes have higher life satisfaction. However, can we be sure that their higher incomes caused their higher life satisfaction? They may have got their high income because they were happier (their happiness made them more effective at work, for example). Or something else (their high level of education or the part of the country they were born in, for example), might have caused both their incomes to become high and their lives to be satisfying.
So, in general, if there is an association between variables X and Y in an observational study, it could be because: (a) X causes Y; (b) Y causes X; or (c) some third variable affects both X and Y. People try to rule out possibilities of type (c) by ‘controlling for’ as many alternative variables as possible (i.e. measuring those alternative variables and adjusting for them in the statistical analysis). We will see how to do this later, but it is never completely conclusive. There could always be an unmeasured third variable you have not thought to include. Plus, the more things you control for, the more danger you have of biasing your estimate of the association you are interested in, or of not being able to interpret what it means.
Observational studies can be roughly divided into cross-sectional ones and longitudinal ones. In a cross-sectional study, all the variables are collected simultaneously. For example, we survey people on their incomes and their life satisfaction, on a particular date in July 2024. The association of interest is whether a person’s income in July 2024 is statistically related to their life satisfaction in July 2024. We are explicitly comparing between people with different incomes.
Longitudinal studies involve measuring things in the same people (or animals) at multiple points in time. The term is used somewhat variably. However, the key point is that you can often identify how a change in one variable relates to the change in the other. For example, you could look at whether people whose incomes increased from 2023 to 2024 also became more satisfied with life from 2023 to 2024. This helps rule out some causal possibilities. For example, people’s gender or childhood background will not normally have changed from 2023 to 2024, so these factors could not explain why the change in income and the change in life satisfaction would be related. However, even longitudinal studies cannot identify causality in the way that experiments can, because there is always the possibility that something else changed that caused the change in both income and life satisfaction. We will examine statistical methods for longitudinal studies in this course.
Even in an observational study, you generally have some idea of what you are thinking of as having an influence on what. The things you think of the causes or inputs will be your predictor variables, and the things you think could be influenced or predicted will be called your outcome variables. In our example, income is the predictor variable and life satisfaction the outcome variable, even though we concede that we cannot totally rule out that the causality in the world goes the other way.
You might have variables that are intermediate between your predictors and your outcomes. We call these mediators or intervening variables. For example, people with higher incomes might be more satisfied with life because they travel more. In this case, travel is a potential mediating variable between the predictor income and the outcome life satisfaction.
If you are doing an observational study, it is useful to make a diagram with each variable that you mention in a box. Put the outcome variable(s) on the right hand side, and the predictor(s) on the left. Draw arrows between boxes wherever you think there could be an influence of the variable at the left to the variable on the right. Intervening variables will be in the middle part of the diagram, and have arrows going into them as well as going out. This is a simple version of something called a directed acyclic graph or DAG. Although observational studies cannot unambiguously identify causality, when you make this diagram you are implicitly taking a view of which variables you see as driving which others. This is fine, and indeed good, because it clarifies what you are testing. You just have to be circumspect in your causal interpretation of whatever you find. For example, if people with higher incomes in an observational study have higher life satisfaction, you should not describe this as ‘the effect of income on life satisfaction’, but rather ‘the association between income and life satisfaction’.
3.3.2 Experimental studies
The defining feature of a truly experimental study is that it contains at least one manipulated variable. We call the variable(s) that we manipulate the independent variable(s) or IVs. The purpose of the experiment is to find out whether one or more other variables, the dependent variable(s) or DVs are causally affected by the change in the independent variables. Thus, the research question of an experimental study can usually be phrased as something like ‘does {independent variable} affect {dependent variable}?’, or ‘how much does {independent variable} affect {dependent variable}?’.
We usually use random assignment to determine which level of the independent variable a participant receives on a particular occasion (experiments in some areas are also called randomized control trials, to emphasize that random assignment is important, and that treatments are always compared to some appropriate control condition). For example, in a vaccine trial, the independent variable is whether the participant gets the true vaccine or an inert placebo. The dependent variable is whether they contract the disease that the vaccine is supposed to protect against. What we can infer is whether (and to what extent) the vaccine truly does protect against the disease.
Because of the random assignment, the participants in the true vaccine group cannot systematically differ from those in the control group, other than in whether they get the true vaccine. Thus, any difference in disease outcome between the vaccine group and the control group cannot be due to anything other than chance (in any two finite-sized groups, there could be more cases of the disease in one than the other just through luck), and the effect of receiving the vaccine. We can estimate how much discrepancy between the rates of disease between the two groups could plausibly be due to chance, and this averages out more and more as the groups in our study become larger. So that leaves only the actual effect of the vaccine. This is why experimental evidence is thought of as the highest form of information about causality that it is possible to gather.
To appreciate how an experimental study is superior to an observational one for inferring causality, imagine that we just do an observational study of the rates of influenza in people who get versus do not receive an influenza vaccine in the course of ordinary life. The people who get an influenza vaccine (in the normal course of life) are probably going to be older, sicker, or work in jobs where they are more likely to be exposed to influenza, compared to people who do not go for a vaccine. That’s why they go and get one! This means that any difference in influenza rate in people who do and do not get an influenza vaccine outside of an experimental trial is a muddle of the effects of the vaccine, and the effects of other systematic differences between people who do and do not go and get one on the incidence of influenza. It’s hard to cleanly disentangle what part of this the vaccine is responsible for.
Experiments often contain variables that are neither independent nor dependent variables. These tend to be referred to as covariates. A covariate is a variable that is not the outcome, but which the experimenter was not able to manipulate either. For example, the sex of the participant, their age, their previous disease history or whether they write with their left hand, might be covariates. The experimenter can measure these things ahead of time, but cannot manipulate them.
Sometimes covariates are just ‘nuisance’ variables, things that can affect the dependent variable but which have nothing to do with the study question. For example, women might get influenza more often than men. For covariates like this, the most important design imperative is that the groups do not differ systematically in terms of the covariate. It would obviously be crazy to have all women in the control group and all men in the vaccine group. Random assignment means the groups should be balanced in expectation, though perhaps not perfectly in practice. You can also consider balancing the groups, making sure that each group contains exactly 50% men and 50% women.
If the researcher cannot balance for a nuisance covariate, the question arises of whether to control statistically for it when analysing the effects of the independent variables on the dependent. The answer to this varies on a case by case basis, but the default should be to not do so, for reasons we will return to. The reason is that once you have adjusted for nuisance covariates statistically, you are identifying something a bit different than the simple causal effect that your research question states. The ideal analysis of an experimental study looks for differences in the dependent variable(s) by the level of the independent variable(s). For anything else, it is best to leave random assignment and any balancing to do the work, rather than make your statistical analysis more complex.
On some occasions, the researcher might actually be interested in the effects of the covariate; that is, variation in the covariate is not just a nuisance but actually relevant. The research question might be ‘is the influenza vaccine more effective for women than for men?’. In this case, sex would be described as an effect modifier or moderator variable. Instead of trying to estimate the extent to which the vaccine reduces the incidence of influenza on average across the two sexes, for this research question, the researcher would be trying to compare the extent to which it reduces incidence between men and women. Here, the covariate has to be included in the statistical analysis, usually in interaction with the independent variable(s), as we will see.
3.3.3 Quasi-experimental studies
Sometimes a variable varies in a way that was not actually manipulated by the experimenter, but nonetheless, different groups experience different levels of it somewhat by chance. A study of such a situation is called a quasi-experimental study. For example, imagine that our research question is on the effect of television on sleep quality. In a country with poor infrastructure, about half the television transmitters stop working and cannot immediately be repaired. So, some towns and villages go without television. We could use this situation to study the effect of television availability (independent variable) on sleep quality (the dependent variable, which we would measure).
The critical features of a quasi-experimental study are: (i) the groups receiving different levels of the independent variable must not be systematically different from one another; and (ii) the participants must not have sought out the different levels of the independent variable.
In this case, this means the following. First, we would need to be satisfied that it was largely at random which television transmitters failed and which ones did not. If the ones serving the big cities never failed, whereas the ones serving the small villages often failed, then this would be a bad quasi-experiment, since any differences in sleep could be the difference between big cities and small villages. So we would need to do a lot of checking that the areas where transmitters failed were quite similar to the areas where they did not (on average). Second, if people in some places chose to have their TV transmitters switched off, for example for religious or lifestyle reasons, then we know there is at least one thing systematically different about them: the fact they made that choice. Having made that choice could be associated with all kinds of other differences about them, that might relate to their sleep patterns as well. A quasi-experiment is more convincing the more like a true experiment, with random assignment, it is. The researcher has to work quite hard to establish that it resembles this situation closely enough.
In our television and sleep example, the research question would have to concern the effect of television availability on sleep quality, rather than the effect of television watching on sleep quality. This is because, in the places that do have functional television transmitters, not everyone will actually watch it. If we are interested in the effect of actually watching television on sleep quality, all we can really say is that not having a transmitter affects the amount of television watching (by making it zero, compared to some non-zero amount that it would have otherwise been). We cannot say that everyone where there is a functional transmitter watches television. In these cases, transmitter availability would sometimes be described as an instrumental variable. This is a variable that has a strong causal effect on the independent variable of interest, without fully determining it. If a change in the level of the instrumental variable has a measurable effect on the dependent variable, this is indirect evidence that the independent variable does. To put this more concretely, if the transmitter breaking down causes an increase in sleep quality, we infer that this means watching television is bad for sleep quality, because we are sure that when the transmitter breaks down, fewer people are watching television.
3.4 Types of experiment in cognitive science
There are two major types of manipulated variable in cognitive science: variables that are manipulated between subjects and those that are manipulated within subjects. An experiment can have several independent variables. An experiment all of whose independent variables are manipulated between subjects is called, unsurprisingly, a between subjects experiment. An experiment all of whose independent variables are manipulated within subjects is called a within-subjects experiment. And an experiment with a mixture of between and within subjects manipulated variables is called a mixed design experiment.
3.4.1 Between-subjects manipulation
In cases of between-subjects manipulation, a different group of participants experiences each level of the independent variable. For example, imagine a new kind of cognitive therapy for depression. Patients are randomly assigned to get this new therapy, or else to receive more traditional counselling. The variable therapy
is manipulated between subjects, because different groups of people experience its two levels, { new cognitive, traditional counselling}
. Between-subjects manipulation has a number of advantages. In many cases, if the same people experience all the levels of the independent variable, those levels interfere with one another. For example if the first therapy they receive is very effective, they will already be in a different state, thinking in an different way, by the time they receive a second one. The order in which they receive them is likely to matter, as well as things like the gap in time between the two. It may be odd or impossible for them to have to complete the same set of dependent variables twice, once after each therapy. And, if during the experiment they receive two different therapies that differ in some obvious ways, they may well guess that the research question involves comparing the two therapies, and this could colour the way they respond, particularly the second time. In cognitive science, we mostly don’t want participants to know what the research question or hypothesis is, or we may be studying their intuitions about the answers to research questions, rather than their normal cognition. Between-subjects manipulation, in which a participant only experiences one level of the variable, is usually the most straightforward from this point of view.
3.4.2 Within-subjects manipulation
In cases of within-subjects manipulation, each participants experiences several different levels of the same independent variable. For example, imagine we are studying whether drinking a sugar drink increases running speed. We could have participants complete a 100m time trial twice, once after drinking a sugar drink, and once after drinking water. We would want to do this with a suitably large time gap to rule out the drink from the first run still having an effect on the second run. We would want to make sure that half the participants did the water run first and the sugar run second, whilst the other did the sugar run first and the water run second. This is called counterbalancing, and it allows us to make sure that whatever we observe reflects the actual sugar effect, not just being more relaxed the second time or something like that.
What is the advantage of within-subjects manipulation? With within-subjects manipulation, we estimate our causal effect by comparing every subject to themselves (their own performance when they had sugar against their own performance when they had water). This is incredibly useful because there are huge differences between individuals in how fast they can run. Some people can run 50% faster than others, and do so consistently. By contrast, our sugar effect might be a 2% increase in performance (this would still be large enough to matter a lot for serious athletes). So, using between-subjects manipulation, the experimental effect we are looking for will be totally swamped by the huge variation that comes from the fact that some people are faster runners than others. We can solve this by drawing up our two groups at random, and making them very large so that the variation in baseline running speed is averaged out between the two groups. But to do this adequately would require hundreds of people!
By contrast, with within-subjects manipulation, two dozen runners might be enough. You would just want to know whether, more often than not, people went faster in the sugar run than they did in the water run. So, within-subjects manipulation allows more powerful inference than between-subjects manipulation about the causal effect for a given sample size (we will return to the notion of statistical power in a later week). Within-subjects manipulation is recommended as long as:
participants can practically complete the dependent variable(s) multiple times with comparable results (does not work for example if people do the task very differently or better on the second attempt);
doing the procedure multiple times will not draw attention to the hypothesis in a way that the researcher wishes to avoid;
the level of the independent variable received the first time does not affect the completion of the dependent variable(s) on subsequent times as well. (This is known as the requirement for ‘wash out’, after drugs trials, which can only be done using within-subjects manipulation by washing the drugs out of the participants’ systems between phases. It does not work for interventions like vaccination, that cause a permanent change in the participant’s immune system; there might be cases like this in cognition too, where once you know something, you cannot unknow it.)
3.5 Study design
We can succintly express how a study works by a short statement which is called its design. The components of the design statement would typically include:
Whether it is observational or experimental;
If observational, whether it is cross-sectional or longitudinal, and what are the main predictors and the main outcome variables.
If experimental, how many independent (manipulated) variables there are, whether each one is manipulated within or between subjects, and how many levels each one can take. For example, a study with two IVs each of which has two levels is known as a 2 x 2 design.
If experimental, whether all possible combinations of the independent variables are presented. For example, with two IVs each of which has two levels, there are potentially a total of four experimental conditions. The table below shows an example, with an imaginary experiment where people have to do a task with their left or right hand (IV1) whilst in the light or in the dark (IV2). If all four possible conditions are presented, the design is called full factorial. If there are only two IVs with two levels, designs are almost always full factorial, but once you get to larger numbers of IVs, or IVs with many more levels, there might be reasons for only presenting a subset.
In the light | In the dark | |
Left hand | Condition 1 | Condition 2 |
Right hand | Condition 3 | Condition 4 |
The design of a study is a separate subsection of the Methods section in many cognitive science papers. Even if it is not, the summary of the design should be prominently stated in your pre-registrations and write ups. The design sometimes includes a statement of the hypotheses and predictions, though more often those are stated separately. We will come to hypotheses and predictions next week.
3.6 Summary
In this lecture, we have introduced some key terminology that you will need in doing cognitive science research:
phenomena, constructs and variables;
qualitative variables, quantitative variables (discrete or continuous), and ordinal variables;
measured versus manipulated variables;
observational, experimental and quasi-experimental studies;
between-subjects and within-subjects manipulation;
study design.
These are the fundamental building blocks of producing and using data in cognitive science.