3.1 Descriptive vs. causal questions

3.1.1 Descriptive questions

How are observations distributed across values of Y? (univariate)⁹
- e.g. How are observations distributed across trust categories in Table 3.1?

Table 3.1: Univariate distribution of trust (2006)
0	1	2	3	4	5	6	7	8	9	10
303	42	172	270	369	1281	853	1344	1295	353	356

How are observations distributed across values of Y and X?
- How are observations distributed across trust and gender values?
We can add as many variables/dimensions as we like
- How are observations distributed across values of trust (Y), gender (X₁) and time (X₂)?
Normally we summarize those distributions
- Are trust values higher (on average) among females than males?

Notes

As the name suggests descriptive research questions are about describing the data. For instance, we could measure trust within the German population using the question ‘Would you say that most people can be trusted or that you can’t be too careful in dealing with people, if 0 means “Can’t be too careful” and 10 means “Most people can be trusted”?’ Consequently, a descriptive question would be to ask are there more individuals with a high level of trust (define as those with a value above 8) or more with a low level of trust (defined as those with a value below 2). In other words, descriptive questions are concerned with the distribution of observations (e.g. individuals) across values of a variable (or several variables), e.g., the variable trust (Y).
Importantly, descriptive questions may involve as many variables as you like. We could add a second variable, gender (X₁, male vs. female), and ask whether females have a higher level of trust - on average - than males. This already points to how we deal with the underlying distibutions. Normally, we summarize them using statistics such as the mean (or other statistics).
And we can also develop hypotheses for our descriptive questions, e.g., we could hypothesize that females have a higher level of trust than males and subsequently test this hypotheses using we collect. Potentially, it makes sense to call hypotheses that simply concern the distribution of data across one more dimensions descriptive hypotheses.
Importantly, time (which will become important later on) is just another variable and a corresponding descriptive question would be: was trust in politicians higher in January 2019 than in January 2020?

3.1.2 Causal questions

Is there a causal link between the distribution across values of Y and values of D?
…in practice we tend to summarise those distributions..
- Continuous variables: Compare means
- Categorical variables (several): Compare probabilites for categories
Group level: Does victimization cause individuals to have a lower level of trust on average (then if they were not victimized)?
Individual level: Does (non-)victimization cause individual i to have a (lower)higher trust level?

Table 3.2: Joint distribution of trust and victimization (2006, N = 6633)
	0	1	2	3	4	5	6	7	8	9	10
no victim	259	36	135	214	320	1142	782	1228	1193	326	331
victim	44	6	37	56	48	139	70	114	101	27	25

## Mean Non-victims:  6.2

## Mean Victims:  5.48

Notes

Causal research questions are of a different kind. From a distributional perspective we could ask whether the distribution of a first variable D is somehow causally related to the distribution of a second variable Y. Again we tend to summarize the corresponding distributions, e.g., we could take the mean of trust.
In Table 3.2 we tabulate victimization (D), measured with the question Have you been insulted or threatened verbally since (month, year)? against trust (Y). Take note that the vicimization variable D is dichotomous (0,1) whereas the outcome variable Y has 11 values (0-10).
The corresponding causal question would be: Does victimization cause individuals to have lower levels of trust (on average that is comparing the means)? Ultimately, this question resides on the group level but is strongly related to the corresponding question on the individual level: Does (non-)victimization cause individual i to have a (lower)higher trust level?
One important aspect that we will encounter later on: In asking our causal question we may focus on certain subsets in our sample once we have collected some data. For instance, we could ask whether the people that have actually been victimized would have had a higher level of trust if they had not been victimized. This question focuses on the subest in our sample that has been victimized.

See Gerring (2012) for a discussion of “What?” and “Why?” questions.↩