5 Hypotheses and predictions, samples and populations

In the next session, we are going to begin some data analysis, using the data from the paper we looked at last week (Paál, Carpenter, and Nettle 2015). But, first, in this session, we need to introduce some necessary and important background concepts.

5.1 Theories, hypotheses and predictions

Most people would say that science is something to do with offering or confirming theories, or testing hypotheses, and these constructs are indeed mentioned a lot in cognitive science. So what is the differences between them.

A theory is a comprehensive framework for predicting and explaining phenomena in a particular domain. It is important to understand that a theory is a much more general kind of thing than any particular hypothesis or prediction (we will define those below). The relationship of theories to hypotheses is one-to-many: a theory can generate a number of different hypotheses that apply to specific situations. Theories can rarely be falsified by a single experiment. More, they stay around if they prove themselves consistently useful (or more useful than the available alternatives) across a range of empirical situations. Theories are not fixed targets: they are constantly being revised or modified to account for new phenomena or results. Theories tend to have a hard core of assumptions, and then various modifiable additional bits that make them more usable empirically. Cognitive science theories are varied in their precision and scope; if you want to understand an example of one, you could read up on Prospect Theory, for example.

It is important to appreciate that much research in cognitive science, including the paper by (Paál, Carpenter, and Nettle 2015) does not have any theory of this kind. It has hypotheses without having theory, other than very general and vague theoretical things that are in the background. This is not a criticism. If you studied some species of newly-discovered bird in Brazil, you could form the hypothesis that it had come in from Africa, and seek evidence for that hypothesis, without there being any general theory under test.

Hypotheses are statements about how phenomena or constructs relate to one another. Ideally, they concern causal relations, though sometimes they are just about associations. Hypotheses are in effect candidate claims. A study might support a hypothesis, thus increasing the evidence for the claim that the hypothesis is true, or fail to support it, weakening the evidence that the hypothesis is true. A hypothesis is a much narrower thing than a theory, and as mentioned above need not arise from a theory. I could consider the hypothesis that restaurants in France are getting worse over time, without having any more general theory about why this is true.

Predictions are statements about the relationships between variables that we should expect if a hypothesis is true. What is the difference between hypothesis and predictions then? A hypothesis is couched in terms of the phenomena or constructs that we are studying, whereas a prediction is couched in terms of the variables we have used to operationalise those phenomena or constructs. (Note that this way of using the two terms is not universally observed; it is just the way I find clearest).

For example, say my hypothesis is that restaurants in France are getting worse. I decide to operationalise this by choosing a random sample of 100 judges each year and sending each of them to a randomly selected restaurant. At the end of the meal they will rate their satisfaction with the meal on a scale of 1 to 10. I will do this now and in 20 years. My hypothesis is that restaurants in France are getting worse over time. My prediction is that the average satisfaction rating will be lower at the second time point than at the first. (This is not a very good study design by the way, I am just making the point). Thus, the hypothesis is about the thing in the world I have a candidate claim about, the prediction is about the specific variables that I have chosen to study it.

Predictions, generally speaking, make a statement about how some aspect of the distribution of one variable (the outcome or dependent variable) will vary according to the value of some other variable(s) (the predictors or independent variables). Most oftenm the prediction is about the mean of the dependent or outcome variable, as in ‘the average reaction time will be shorter in the caffeine than the no caffeine condition’. More rarely, predictions can be about other aspects of the distribution of the dependent or outcome variable, such as its variance. And for outcome variables that are not continuous, predictions won’t be about the average value. For example, if the dependent variable is a binary variable with levels ‘cooperated’ or ‘did not cooperate’, then your prediction would be about the odds of cooperating when the independent variable is at one of its levels versus when it is at the other.

Now, go to our paper on behavioural inhibition and read the introduction and if necessary the methods section (https://peerj.com/articles/964/). I have already told you there is no general theory here, but there are hypotheses, and predictions. Write down what these are. Generally speaking, each hypothesis has a corresponding prediction. In cognitive science, we often write down the hypotheses of a study in a numbered list H1, H2…., and the predictions in a numbered list P1, P2…… You may not need to separately list the hypotheses and the predictions. Often, people will outline the hypotheses in continuous prose, and then provide a list of predictions, or, as in this paper, just a paragraph summarising them. (Note, there are some variables which the authors stress the need to include as covariates, where they have an expectation of an association with SSRT, but this association does not see to be part of their research question. Do you list these expectations as predictions of the study? Or not?).

5.2 Exploratory and confirmatory research

Not all science tests narrow predictions and hypotheses. Sometimes, you just want to understand how things relate to or affect one another but you have an open mind about what might be true. Research where you can’t state a small number of hypotheses and predictions ahead of time is called exploratory research. Where you did have specific hypotheses or predictions ahead of time, the research is called confirmatory, because you are using the study to confirm whether the hypothesis is supported, or not.

Confirmatory research is not better than exploratory research. Both are important aspects of the knowledge-making process. Patterns that are discovered in exploratory research can then be stated as confirmatory hypotheses and tested in subsequent confirmatory research. This was we can establish if they were just one-offs or represent some more general regularity. The important thing is that researchers must always be clear whether their research (or some part of it) is exploratory or confirmatory.

These days you can only consider your research to be confirmatory if you have pre-registered your hypotheses and predictions. These means having made an irrevocable statement of what they are, and lodged this in a public repository, before you analysed the data. If you have not done this, your research is exploratory (and by the way, I would pre-register even for research that you intend to be exploratory). This innovation was made because, in the past, many people conducted exploratory research, but then claimed that the thing they found was the thing they had hypothesized all along. This is known as hypothesizing after the results are known or HARKing. We will encounter it next week. But the whole apparatus of confirmatory research - concepts like ‘statistical significance’ and the ‘p-value’ that we will meet soon - doesn’t work rigorously in exploratory research where there are many possible patterns you might look for. If you apply these concepts to exploratory research, and then present what you find as if it were actually a confirmatory test, you can pretty much always find something, and that something rarely turns out to be more generally true in subsequent studies. This is one of the main reasons why the replication rate and positive predictive value in cognitive science have turned out to be so low (see week 1). So: no confirmation without pre-registration!

5.3 Samples, populations and inference

We generally work with modest samples of participants. One hundred people might take part in your experiment. Or, if you are doing an opinion poll to predict the result of an election, you might survey 2000 people. This sample is a kind of small world.

You don’t really care about the small world, though. You care about some large world of which the small world is a studiable part. For example, in the opinion poll case, you don’t really care about how those 2000 particular people are going to vote; you care about how the whole country is going to vote, and you hope to use those 2000 people to give you evidence about that. The large world is called the population. Depending on your research question, the population could be French voters, French restaurants, or humans in general. The point is that you can’t study the whole of the population as this would be impractical. So you study the small world, and try to extrapolate to what might be true about the large one from the data you get. This process of extrapolation from the small world of your sample to the large world of the population is called statistical inference. Doing statistical inference right is the main job of statistics in cognitive science. The statistics we use to do are called, unsurprisingly, inferential statistics.

5.4 Parameters and estimation

The properties of the small world of your sample are knowable exactly. For example, in the paper on behavioural inhibition, we can calculate the difference in SSRT between the participants in the negative and neutral conditions to as many decimal places as we like. We can get the exact descriptive statistics of our sample. But what we want to know about is actually the difference in SSRT between people in negative and neutral moods in human beings in general. Without studying all human beings, we cannot know this with complete certainty. We can only study more and more people and make better and better estimates.

In statistics, quantities like the difference in SSRT between people in negative and neutral moods in the human population are known as parameters. Parameters are unknown, by definition, because you cannot study everyone and test them an infinite amount of times. What you can do is estimate the parameters using small-world samples. These estimates will carry with them a degree of precision. If I study 20 people, the precision is poor; my parameter estimate could be a long way off the correct value. If I study 20,000 people, my precision will be much better. This is for the same reason that you get closer to 50% heads when you toss a coin a thousand times than when you toss it 6 times; the greater amount of data averages out the fluctuations due to chance.

Hypotheses and predictions concern parameters of the population, not just averages of the particular sample, because what we care about understanding is really the large world of the population. So, the question you are asking in science is usually: given what I saw in the small world of the sample, what can I estimate to be true of the big world of the population, and, importantly, how precise is that estimate? In other words, what is the margin of error on whatever I conclude? What you want to do is to make estimates that are both accurate and precise. If you do this, your results should display out of sample generalization. This means you will see much the same thing again if you study a new sample. On the other hand, you don’t want to study more people than is necessary, because data cost time and money. For example, having a million people in your opinion poll gives only slightly more precise estimates than having 10,000 people, yet costs one hundred times more. (Precision in parameter estimation generally rises with the square root of the number of observations.) So you are always looking to balance the precision of your parameter estimates against the difficulty or cost of getting the data.

5.5 Statistical models and statistical tests

In inferential statistics, we apply a statistical model to our data. A statistical model is a set of assumptions about the process in the big world of the population that generated the data in the small world of our sample. The model, informed by the data, allows us to estimate the parameters of the population and get an idea of the precision of those estimates. Estimating parameters and obtaining the precision of those estimates is (arguably) the most important thing you do in inferential statistics.

However, people are often focused on a second thing we can do with our model, which is apply statistical tests. A statistical test is a procedure for evaluating, given the evidence of the data, whether we should believe a particular condition is met in the population. Most commonly, people might want to ask: should I believe that the difference in SSRT between people in negative and neutral moods is something other than zero? The idea that the difference is zero is called the null hypothesis; it says that there is no causal relation between mood and SSRT. The null hypothesis is a kind of default assumption. Unless there is evidence to the contrary, we should assume no link between any old pair of variables. Statistical tests are procedures for asking whether we should believe the null hypothesis, or whether the data support the view that the null hypothesis is not true, and that there is some causal link between the two variables. This is called rejecting the null hypothesis, and a result that rejects the null hypothesis is described as statistically significant.

Parameter inference and statistical tests are linked. If your best estimate of the difference in SSRT between people in negative and neutral moods is about zero (or zero is within the margin of error or confidence interval of your parameter estimate), then the test will fail to reject the null hypothesis. But statistical tests alone are not very interesting. At best, even used correctly, they can only tell you that some parameter is probably not zero. Of greater interest is the question: how big is the parameter, and in what direction? Is it big enough to actually matter, and is at as big or small as our hypothesis predicts?

5.6 The General Linear Model

If you learned statistics the traditional way, you probably learned that there was a great forest of different statistical tests available, such as the t-test, linear regression, ANOVA, ANCOVA, and other names. Which one you use depends on the kinds of predictor and outcome variables you have. This course takes a different approach. All of these tests represent the application of the same basic statistical model, known as the General Linear Model oto your data. This means that in fact you only need to learn one basic thing, fitting a general linear model to your data using the R function lm(), and then how to interpret and report what you find. So, I will not be talking about, for example, t-tests and linear regression as separate things.

The general linear model can’t do quite everything. It is suitable for outcome variables that are continuous. When our outcomes are binary (something happened or did not), or discrete (the number of times something happened), we need extensions of the general linear model that belong to the family known as Generalized Linear Models. This is a pretty terrible name for two reasons: it also spells out GLM, just like the General Linear Model, so when people abbreviate it is unclear which one they mean; and, the General Linear Model actually belongs to the class of Generalized Linear Models. I know, I know. Sometimes people use the abbreviation GLIM for the generalized version and GLM for the general. But sometimes people use GLM for the generalized one, including R, whose function for generalized linear models is called glm(). So, I will always spell out in words General Linear Model or Generalized Linear Model. For now we are only considering the basic General Linear Model.

5.7 Summary

In this brief session, we have met some key concepts:

Theory, hypothesis and prediction;
Exploratory and confirmatory research;
Sample and population;
Parameter estimation;
Statistical tests.

I also flagged up the General Linear Model, which is going to be key to the data analysis we go on to do next.

References

Paál, Tünde, Thomas Carpenter, and Daniel Nettle. 2015. “Childhood Socioeconomic Deprivation, but Not Current Mood, Is Associated with Behavioural Disinhibition in Adults.” PeerJ 3 (May): e964. https://doi.org/10.7717/peerj.964.