Chapter 5 Analysis of necessity

Social phenomena have complex causal configurations. While researchers can advance various hypotheses about how these phenomena are produced, their complete causal mix will probably never be completely understood.

While all causes have an impact in the causal structure, some causes are more important than others. Some are so important, that the outcome doesn’t happen in their absence. They might not be sufficient to trigger the outcome on their own, but they are important enough to be a necessary part of the causal mix: whatever the causal combination it contains, the mix will always contain those necessary conditions. From Skocpol’s (1979) book on social revolutions, we could derive the following statement: a social revolution (Y) is produced only in the context of a state break down (X). If the state break down would not exist, the social revolution could not be produced; it means the state break down is a necessary (although not always sufficient) cause of a social revolution:

\[\mbox{X} \leftarrow \mbox{Y}\]

5.1 Conceptual description

This line of reasoning is valid for all types of sets: binary crisp, multi-value crisp and fuzzy. They can all be graphically represented by a subset / superset relation, as shown in the figure 5.1.

X $\leftarrow$ Y: causal condition X necessary for the outcome Y

Figure 5.1: X \(\leftarrow\) Y: causal condition X necessary for the outcome Y

A necessary set X is a superset of an outcome set Y, which means X is present in all instances where Y happens. Because Y is completely included in X, there is no single instance of Y to be present and X to be absent. There are situations where X is present and Y is not, but the situations where Y is present all happen within the situations where X is present, therefore X is a necessary condition for Y.

The sign “\(\leftarrow\)” does not imply any causal direction. Although it looks like an arrow, the sign implies nothing about Y causally leading to X, but it is translated to a logical implication: whenever Y happens, X is present as well.

Braumoeller and Goertz (2000, 846) present two complementary definitions about necessity:

Definition 5.1: X is a necessary condition for Y if X is always present when Y occurs.

Definition 5.2: X is a necessary condition for Y if Y does not occur in the absence of X.

In terms of set theory, and in line with the Euler/Venn diagram above, Goertz (2006a, 90) supplements with a third, equivalent definition:

Definition 5.3: X is a necessary condition for Y if Y is a subset of X.

There are other types of graphical or tabular representations of necessity, that tell the same story. The simplest one, that involves binary crisp sets, is a 2 \(\times\) 2 crosstable to demonstrate the difference between correlation and set necessity relation.

Correlation (left) and necessity (right)

Figure 5.2: Correlation (left) and necessity (right)

In table 5.2, the left side speaks the language of statistics, where the most important numbers are located on the main diagonal. It shows a perfect correlation, because there are no cases on the other diagonal. On the right side, the correlation is broken because of the 23 cases on the other diagonal, but a different kind of (set) language is spoken: given there are no cases where Y is present and X is absent, the set corresponding to Y is completely included within the set corresponding to X, which leads to the conclusion that X is a necessary condition for Y.

The set of X is bigger than the set of Y, containing 60 cases compared with only 37 in Y. Out of those 60 cases, 23 are the ones where X is present but outside of Y (where Y is absent), making the crosstable tell the same story as the one from figure 5.1.

Unlike correlation, sets are asymmetric. If a condition X is necessary for Y, the same relation does not automatically holds for their absence. It is not mandatory for the absence of X to be a necessary condition for the absence of Y (\(\sim\)X \(\leftarrow\) \(\sim\)Y), and most often there is another causal condition Z, which is necessary for the absence of Y:

\[\mbox{Z} \leftarrow {\sim}\mbox{Y}\]

For this reason, the analysis of necessity should be performed separately for the presence of the outcome Y and for its absence.

To have some words surrounding the numbers in the 2 \(\times\) 2 table, we can employ Goertz’s (2003, 49) very good example for the relation between economic development as condition X and democracy as outcome Y, with the following insightful distinction between a hypothesis based on correlation and another hypothesis based on set necessity:

  • the higher the level of economic development, the more likely a country is a democracy
  • a minimum level of economic development is necessary for democracy

These are two hypotheses combining the same concepts, but in very different settings. The first one says democracies are bound to happen where the level of development is large enough. In practice, however, there are countries where the level of development is large but they did not develop democracies. This kind of contradiction is difficult to explain using correlations, but it makes perfect sense when thinking about set necessity relations: it is true there are nondemocratic countries where there is a large enough level of economic development, but on the other hand there is absolutely no instance of democracy where the level of economic development is very weak.

This empirically observed, asymmetric nature of set relations, can further be extended to demonstrate the necessity relation does not hold for the absence of democracy: there are indeed nondemocratic countries that do not have a minimal level of economic development (which would seem to support the hypothesis \(\sim\)X \(\leftarrow\) \(\sim\)Y), but there are also nondemocratic countries that are doing very well from an economic point of view.

The lack of economic development is not a necessary condition for the lack of democracy, which is explained by different other factors (an alien conclusion from a statistical, correlational point of view). Statistical methods try to explain the outcome (dependent variable) using a single model for both high and low values of the dependent variable, while QCA finds multiple causal combinations that lead to the same outcome (equifinality).

Necessity relations can be extended from binary to multi-value crisp sets, using the same overall method. In fact, since a binary crisp set is a particular case of a multi-value crisp set with two values, the necessity relation X \(\leftarrow\) Y can also be written (using standard multi-value notation) as:

\[\mbox{X[1]} \leftarrow \mbox{Y}\]

meaning that X is a necessary condition for Y when it takes the value of 1 (i.e., for binary crisp sets, where it is present), if and only if there is no instance of Y being present where X = 0 (X is absent).

Multi-value sets can have more than two values, but the same explanation holds for any one. Assuming a set X has three values (0, 1 and 2), then X{2} is a necessary condition for Y (see figure 5.3) iff:

  • all cases of Y being present are included in the set of X = 2, and
  • there is no instance where Y to be present outside the area where X = 2 (outcome Y is present only where X = 2, and nowhere else)
X[2] $\leftarrow$ Y: causal condition X is necessary for Y when equal to 2

Figure 5.3: X[2] \(\leftarrow\) Y: causal condition X is necessary for Y when equal to 2

Such a perfect set inclusion (of Y into X, all democracies have a minimum level of economic development) is also very rare to observe. In reality, there are some countries which develop democracies, despite having a very weak level of economic development.

This is a situation where fuzzy sets prove their usefulness, for it is less important to have absolutely all cases in Y included in the set of X, but more important to have a high percentage of cases of Y inside X (higher than a certain threshold). In classical crisp sets, a single case of Y happening outside of X can undermine the entire necessity claim. But in a situation where this single case would be compared with another 100 cases of Y happening within X, wouldn’t this be an overwhelming (albeit not complete) evidence that X is necessary?

Almost but not complete inclusion of Y into X

Figure 5.4: Almost but not complete inclusion of Y into X

Figure 5.4 shows such a situation, where the vast majority of Y cases are included in X, and a few are outside. Y is still included in X, although not completely, but the inclusion is high enough to conclude that X is a necessary condition for Y.

The small part of Y outside of X is the upper left cell of the 2 \(\times\) 2 crosstable 5.2. If equal to zero, Y would be completely included in X, and the more cases appear in that cell, the more Y goes outside.

When crisp sets are involved, it is easy to think in terms of number of cases for each cell of the crosstable. But figure 5.4 can also be specific to fuzzy sets, since the inclusion itself is not only “in” or “out”, but more or less in (or more or less out).

The inclusion score (its calculation will be introduced in the next section), is any number between 0 (completely outside) and 1 (completely inside), and this is the very essence of a fuzzy set, where the fuzzy scores themselves are numbers between 0 and 1.

When dealing with an infinite number of potential values between 0 and 1, a crosstable between X and Y becomes impossible. For fuzzy sets, necessity relations are not a simple matter of 0 cases in the upper left cell, but a matter of having the fuzzy scores of X higher than the fuzzy scores of Y.

Crosstables are useful for categorical variables (any crisp value can be considered a category), and cases can be counted for each cell of the table. Fuzzy sets are continuous numeric variables, and that kind of data is best represented using a scatterplot, which in QCA jargon is called an XY plot.

When fuzzy scores of X are larger than the fuzzy scores of Y, the points are located in the lower right part of the XY plot. In a way, this is similar to a crosstable, where necessity means there are zero cases in the upper left cell, while in the XY plot necessity means zero points in the upper left part, above the main diagonal.

Fuzzy necessity

Figure 5.5: Fuzzy necessity

Figure 5.5 is a classical example of a fuzzy necessity relation, and relate the same story of Y being completely included within X, given that all Y scores are lower than X scores (i.e. all points are located below the main diagonal, in the greyed area), and it corresponds to the Euler/Venn diagram in figure 5.1.

If some of the Y values would be greater than corresponding values in X, they would be found outside the greyed area, above the main diagonal, and would correspond to the Euler/Venn diagram in figure 5.4. To maintain the necessity relation, it is important to have not more than a few cases above the diagonal, or in other words, the proportion of cases below the diagonal has to be very large, or at least larger than a certain threshold.

5.2 Inclusion / consistency

The term “inclusion” means exactly what the figure 5.4 displays: the proportion of the set Y that is included in the set X. It has quite a natural interpretation in terms of fuzzy sets (different from either included or not, but rather more or less included), although for fuzzy sets there are better graphical representations in terms of XY plots.

For the binary crisp case, a 2 \(\times\) 2 table can be represented using a general case:

General 2 $\times$ 2 table for necessity

Figure 5.6: General 2 \(\times\) 2 table for necessity

The focus, for the analysis of necessity, is on the cells a and c and the inclusion score is simply calculated with the formula:

\[inclN_{X\phantom{.}\leftarrow\phantom{.}Y\phantom{.}} = \frac{\mbox{X} \phantom{.} \cap \phantom{.} \mbox{Y}}{\mbox{Y}} = \frac{\mbox{c}}{\mbox{a} + \mbox{c}}\]

This should be read as the intersection between X and Y (both happening, equal to 1 in cell c), out of the total set Y.

There are multiple ways to calculate this inclusion score in R. Given two binary crisp objects X and Y, the simplest form to calculate is using this command:

sum(X & Y)/sum(Y)

To demonstrate, we will load the crisp version of the Lipset data, which has the following conditions: DEV, URB, LIT, IND, STB and SURV.

Before going into more details, a complete information about this dataset (and other datasets shipped with the package QCA), as well as about individual conditions and outcome can be obtained by typing ?LC or help(LC) on the R command prompt.

Alternatively, in the graphical the help page can be reached by clicking the “Help” button in the dialog Load data from attached packages, as shown in section 2.4.8.

Suppose we want to test the necessity of causal condition URB for the outcome SURV, with the following cross table:

using(LC, table(SURV, DEV))
SURV 0 1
   0 8 2
   1 0 8

There are 8 cases of SURV happening, and for all of them DEV is happening as well (the R table prints the data from top to bottom, so the relevant line is the one below). The inclusion of the outcome SURV in the causal condition DEV is then:

using(LC, sum(DEV & SURV) / sum(SURV))
[1] 1

The logical operator “&” can be employed to calculate the intersection, because binary values are interpreted by R as logical vectors. It assumes of course that both the condition and especially the outcome, are binary crisp sets.

But package QCA has a dedicated function called pof(), with a default value of the argument relation = "necessity", which calculates the same thing but with more information which will become relevant later:

pof(DEV, SURV, data = LC)

        inclN   RoN   covN  
1  DEV  1.000  0.800  0.800 

This is a situation where a single condition is necessary for the outcome, with a complete inclusion of the set SURV (survival of democracy) into the causal condition DEV (level of development). The function pof() accepts another straightforward way to ask for this necessity relation, using the left arrow “<-” relation:

pof(DEV <- SURV, data = LC)

        inclN   RoN   covN  
1  DEV  1.000  0.800  0.800 

For multi-value conditions, the procedure is quite similar. Since the conditions are also crisp, they can form cross tables with as many columns as the number of values in the causal condition, while the region of interest remains the same: the row where outcome Y is present with the value of 1:

General crosstable for multi-value necessity

Figure 5.7: General crosstable for multi-value necessity

The inclusion score for each of the values in X (seen as quasi-separate sets, as it was presented in figure 5.3), has a very similar calculation method as the one for binary crisp sets:

\[inclN_{X[v]\phantom{.}\leftarrow\phantom{.}Y\phantom{.}} = \frac{\mbox{X[}v\mbox{]} \phantom{.} \cap \phantom{.} \mbox{Y} }{\mbox{Y}}\]

It is the intersection between the outcome set Y with the (sub)set formed by a specific value \(v\) of the condition X, out of the total number of cases where Y it is present. For example, to calculate the necessity of value 2 of X for the outcome Y, we can calculate the necessity inclusion by dividing the cell e and divide it by the sum of all cells where Y is present:

\[inclN_{X[2]\phantom{.}\leftarrow\phantom{.}Y\phantom{.}} = \frac{\mbox{e}}{\mbox{a} + \mbox{c} + \mbox{e}}\]

As a practical example, the Lipset data has a multi-value version:

using(LM, table(SURV, DEV))
SURV 0 1 2
   0 8 2 0
   1 0 3 5

Calculating the necessity inclusion for the set corresponding to condition DEV equal to 2, can be done either with:

using(LM, sum(DEV == 2 & SURV) / sum(SURV))
[1] 0.625

or with:

pof(DEV[2] <- SURV, data = LM)

           inclN   RoN   covN  
1  DEV[2]  0.625  1.000  1.000 

There are 5 cases for DEV equal to 2 where SURV is present, but they do not account for all cases where SURV is present. There are 3 other cases when DEV is equal to 1, and for this reason none of the multiple individual values of DEV are necessary for SURV.

For fuzzy sets, the calculation method differs (there are no cross-tables to get the number of cases from), but the spirit of the method remains the same. In fact, as we will see, the formula for fuzzy sets can be successfully employed for crisp sets, with the same result.

The Euler/Venn diagrams from figures 5.1 or 5.4 are tale telling for all variants, crisp and fuzzy. The necessity inclusion is the proportion of Y in the intersection between X and Y.

Section 3.3.2 introduced the fuzzy intersection and the dedicated function fuzzyand(). For exemplification, the fuzzy version of the same Lipset data will be used:

using(LF, sum(fuzzyand(DEV, SURV)) / sum(SURV))
[1] 0.8309859

This is a command very similar to those from the crisp sets, and the result is confirmed by the general use pof() function:

pof(DEV <- SURV, data = LF)

        inclN   RoN   covN  
1  DEV  0.831  0.811  0.775 

The function fuzzyand() is universal and it can successfully replace the “&” operator from the binary crisp version:

using(LC, sum(fuzzyand(DEV, SURV)) / sum(SURV))
[1] 1

as well as from the multi-value crisp version, when DEV is equal to 2:

using(LM, sum(fuzzyand(DEV == 2, SURV)) / sum(SURV))
[1] 0.625

This demonstrates that a general fuzzy version for the necessity inclusion equation can be used for both crisp and fuzzy sets:

\[inclN_{X\phantom{.}\leftarrow\phantom{.}Y\phantom{.}} = \frac{\sum{min(\mbox{X, Y})}}{\sum{\mbox{Y}}}\]

As intuitive as it seems, this formula works counter-intuitive to a Euler/Venn diagram. If the set Y has a 0.7 inclusion in a condition X, we would expect the set Y to be 0.3 outside (excluded from) the set X.

For crisp sets it works, but contrary to this expectation fuzzy sets can have surprisingly high “inclusions” in both X and \(\sim\)X.

Table 5.1: Fuzzy intersections
X Y X*Y ~X*Y
0.30 0.20 0.20 0.20
0.50 0.40 0.40 0.40
0.55 0.45 0.45 0.45
0.60 0.50 0.50 0.40
0.70 0.60 0.60 0.30

Table 5.1 presents hypothetical values for a condition X and an outcome Y, with the intersections X*Y and \(\sim\)X*Y. Using the fuzzy version of the formula, the necessity inclusion for X is 2.15/2.15 = 1, and the necessity inclusion for \(\sim\)X is 1.75/2.15 = 0.814.

Normally, we do not expect a set to be “included” this much in both X and \(\sim\)X, thus questioning the term “inclusion” which is more accurately referred to as “consistency”. Using this alternative term, it makes sense to think of both X and its negation \(\sim\)X as being consistent with Y in terms of necessity.

This situation is called a simultaneous subset relation and is a feature of fuzzy sets, which allow an element to be partially consistent with the presence of a set and in the same time to be consistent with the negation of that set. This is also true in terms of set relations, meaning that a causal condition set can be simultaneously necessary for the presence of an outcome set, as well as for its absence.

In fuzzy sets, it helps to think about a set and its negation as two completely different sets: they are complementary, but unlike crisp sets where an element either in or out of a set (rule of excluded middle), in fuzzy sets an element is allowed to be part of both a set and its negation, and in some situations an element can have high inclusions in both.

In terms of consistency, a set X is necessary for a set Y when the fuzzy values of Y are consistently lower than the fuzzy scores of X across all cases (when the fuzzy values of Y consistently display a subset relation with X). In such a situation, the set Y is included in the set X because most of their intersection belong to Y (or cover Y), therefore we can say the necessity consistency is high.

5.3 Coverage / relevance

Coverage is a measure of how trivial, or relevant is a necessary condition X for an outcome Y. The classical example of the relationship between fire and the air, says that air (oxygen) is necessary to start a fire. But this is an irrelevant necessary condition, for a fire is not started from the mere presence of the oxygen. That is necessary to maintain it, but does not start it as there are many other situations when air is present without a fire.

In terms of Euler/Venn diagrams, trivialness can be detected by measuring the proportion of the area within condition X that is covered by the set Y, as seen in figure 5.8.

X as an irrelevant necessary condition for Y

Figure 5.8: X as an irrelevant necessary condition for Y

The outcome Y is a very small set, compared to the necessary condition X: there are very many cases (practically most) where X is present but Y does not happen. This is a typical description of an irrelevant necessary condition: although Y is completely included in the set X, its coverage is very small.

For crisp sets, this is the equivalent of a 2 \(\times\) 2 table where the number of cases in cell c (where both X and Y are present) is very small compared to the number of cases in cell d where X is present and Y is not.

Testing trivialness and relevance in a 2 $\times$ 2 table

Figure 5.9: Testing trivialness and relevance in a 2 \(\times\) 2 table

The formula to calculate necessity coverage, for binary crips sets, is:

\[{covN}_{X\phantom{.}\leftarrow\phantom{.}Y\phantom{.}} = \frac{\mbox{X} \phantom{.} \cap \phantom{.} \mbox{Y}}{\mbox{X}} = \frac{\mbox{c}}{\mbox{c} + \mbox{d}}\]

This is interpreted as the proportion of X covered by its intersection with Y. For necessity coverage, the attention is focused on the cells located on the right column, instead of the first row as in the necessity inclusion.

In the hypothetical crosstable 5.9, X is a perfect necessary condition for Y (it is completely included within X, there are no cases in cell a), but is an irrelevant one because the proportion covered by their intersection 5/125 = 0.04 is extremely small. Only 4% of X is covered by Y, given there are many cases in cell d where Y is not present.

For most people, air is a trivial necessary condition for fire. If table 5.9 would refer to the relation between air (X) and fire (Y), the small coverage supports our empirical knowledge that fire does not start from thin air. However, there are many cases in the cell b (let’s say 25 attempts to start a fire in the absence of oxygen), but that number can be close to plus infinity if we think about the extra-terrestrial, outer solar system conditions in the rest of the Universe, where we don’t observe oxygen and also don’t observe fire.

Analyzed from this perspective, air does not appear to be a trivial condition for fire, as fire and air are highly associated with one another. If we could observe this relationship from outside the Universe, fire is always observed in the presence of oxygen, while at the same time it can be observed the vast majority of (the rest of) the space where there is no oxygen and no fire.

Contrary to the common sense perception, air is actually a non-trivial necessary condition for fire. Empirically, we do know that air is irrelevant because it cannot cause fire, as there are also very many cases of air without fire, but their relation is far from trivial.

Goertz (2006a) has a welcome contribution to the analysis of necessity, making a difference between the concepts of trivialness and relevance. His work is fundamental for the modern QCA, following Ragin and in the same time pivotal for Ragin’s later developments.

He relates trivialness to cell b of the table and relevance to cell d, both of them being found in the bottom row where Y is absent. The earlier work of B. Braumoeller and Goertz (2000) make this point very clear, designing a series of tests to evaluate trivialness of (proven) necessary conditions using a \(\chi^2\) test of homogeneity applied to a contingency (2 \(\times\) 2) table.

While Braumoeller and Goertz employed the \(\chi^2\) test of homogeneity, I believe they have in fact employed the \(\chi^2\) test of independence (the difference is subtle, and the calculation method is exactly the same) for which the null hypothesis is that two categorical variables are independent of one another.

Their analysis boils down to stating that a necessary condition X is trivial if X and Y are independent (if the null hypothesis cannot be rejected).

Given that cell a is equal to zero (because X is necessary), this conclusion is true only if cell b is also equal to zero (that is, we have no empirical evidence of \(\sim\)X and \(\sim\)Y occurring together). Put differently, a necessary condition X is trivial if and only if there is no empirical evidence of \(\sim\)X, as in the Euler/Venn diagram from figure 5.10.

Trivial necessary condition X for the outcome Y

Figure 5.10: Trivial necessary condition X for the outcome Y

It looks as though condition X is missing, but in fact the necessary condition is so large that it covers the entire universe defined by the outer square, while the outcome Y is still a perfect subset of the condition X.

Returning to the Lipset data, we may explore the relation between LIT (level of literacy) and SURV (survival of democracy) for the crisp version:

tbl <- using(LC, table(SURV, LIT))
SURV 0 1
   0 5 5
   1 0 8

The condition LIT is a necessary condition, with a necessity inclusion of 1 for the outcome SURV (0 cases occurring when democracy survives in the absence of literacy). It is now possible to test for trivialness, by testing their independence.

Two events are said to be independent, if the occurrence of one does not affect the odds of the other occurring, which is the spirit of the Fisher’s exact test of independence between two categorical variables.


    Fisher's Exact Test for Count Data

data:  tbl
p-value = 0.03595
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
 0.9305029       Inf
sample estimates:
odds ratio 

With a significant p-value of 0.03595, we can reject the null hypothesis of independence between rows and columns7. There are other functions, for example fisher.exact() in package exact2x2 that do a better job in calculating the confidence interval, but the overall decision to reject the null hypothesis is the same and decide with 95% confidence that survival of democracy is not independent of literacy (making it non-trivial.)

It is perhaps important to stress, one more time, that a test of trivialness can only be performed if the condition is necessary, that is to have a very small proportion of cases (towards 0) in the cell a where Y is present and X is absent.

Relevance, on the other hand, is directly tested by the coverage of X by Y: a higher proportion of cases in cell d (X occurring but not Y) relative to cell c (X and Y occurring together), lowers the relevance of the necessary condition. A necessary condition becomes more and more relevant as the proportion of cell c increases, except for the situation when both X and Y become irrelevant if they are so large that they both fill the entire universe (both being constant and omni-present).

In the Lipset case, although LIT is definitely a necessary condition it is not the most relevant one, given the raw coverage of 0.615:

pof(LIT <- SURV, data = LC)

        inclN   RoN   covN  
1  LIT  1.000  0.500  0.615 

The function pof() returns, for necessity relations, another parameter of fit called RoN (Relevance of Necessity) that will be discussed below, when calculating the fuzzy version of the coverage. For the moment, it is sufficient to notice that literacy has a relevance score of 0.5 (even lower than its coverage).

The lower the RoN score, the more trivial a condition is, and the higher the RoN score the higher the relevance, the bigger the relative importance of that condition as a necessary condition. This conclusion is valid for both binary and multi-value crisp.

Extending the test of trivialness for multi-value data is just as simple, because multi-value data can be simplified to a binary situation by considering each individual data versus everything else (versus its complement). Let us examine again the relation between DEV and SURV in the multi-value version of the Lipset data:

using(LM, table(SURV, DEV))
SURV 0 1 2
   0 8 2 0
   1 0 3 5

There are exactly zero cases in cell a (remember that R table prints the rows in the reversed order), where SURV is present and DEV is completely absent. But that does not make any of the other two values of DEV necessary for the survival of democracy, their inclusion being rather modest:

pof(DEV[1] + DEV[2] <- SURV, data = LM)

               inclN   RoN   covN  
1      DEV[1]  0.375  0.867  0.600 
2      DEV[2]  0.625  1.000  1.000 
3  expression  1.000  0.800  0.800 

That happens because DEV{1} and DEV{2} behave as self-contained sets, despite the fact they are two categories of the same causal condition. The necessity inclusion of DEV{1} for SURV is calculated by taking the cases from DEV{1} in opposition to all other categories DEV{0} and DEV{2}.

We can arrive at exactly the same result by collapsing all other categories to create a familiar, binary crisp 2 \(\times\) 2 table:

DEV1 <- recode(LM$DEV, "1 = 1; else = 0")
table(LM$SURV, DEV1)
    0 1
  0 8 2
  1 5 3

We can use this separate object DEV1 to calculate its necessity inclusion for the outcome SURV from the data LF (notice how the structure of the command is changed), with the same end result:

pof(DEV1, LM$SURV)

         inclN   RoN   covN  
1  DEV1  0.375  0.867  0.600 

Since no single value of condition DEV is individually necessary, it doesn’t make sense to test for trivialness. The RoN score of 0.867 is high, but it does not matter since the value 1 of condition DEV is not individually necessary for the survival of democracy.

The fuzzy situation is more challenging, similar to the necessity inclusion: since both conditions and outcome have fuzzy scores, there is no contingency table around to apply these specific calculations.

Much like in the case of inclusion, fuzzy coverage is about the proportion of X that is covered by Y, or better said covered by the intersection between X and Y, given that Y already is a (perfect) subset of X, a very easy task using the same fuzzy intersection, but this time dividing over the sum of X:

\[{covN}_{X\phantom{.}\leftarrow\phantom{.}Y\phantom{.}} = \frac{\sum{min(\mbox{X, Y})}}{\sum{\mbox{X}}}\]

Using the fuzzy version of the Lipset data, the necessity coverage for literacy LIT and the outcome SURV is:

using(LF, sum(fuzzyand(LIT, SURV)) / sum(LIT))
[1] 0.6428027

This function holds for all versions, crisp and fuzzy. Remembering that coverage for the crisp version of LIT was 0.615, we can test this with:

using(LC, sum(fuzzyand(LIT, SURV)) / sum(LIT))
[1] 0.6153846

Naturally, one need not go through all these individual calculations, as the function pof() already provides both inclusion and coverage in a single output:

pof(LIT <- SURV, data = LF)

        inclN   RoN   covN  
1  LIT  0.991  0.509  0.643 

Once we know that LIT is a necessary condition, trivialness was detected in the crisp version if there were (close to) zero empirically observed cases in the \(\sim\)X column. But this kind of finding is also valid the other way round: a necessary condition X is trivial if all observed cases are located in the presence column where X = 1.

For fuzzy sets necessity, all (most) cases are located below the main diagonal, and trivialness is detected just like in the crisp version, when all cases are located vertically on the right side of the XY plot, where X = 1. If all cases would be constantly equal to 1 in the condition X, then no matter what fuzzy values Y would take, X would still be a superset of Y (hence a necessary condition), but it would be a trivial one because X is constant and omni-present.

Fuzzy trivialness of the necessary condition X

Figure 5.11: Fuzzy trivialness of the necessary condition X

The more values come closer to the main diagonal, and away from the right side of the plot, the more relevant the condition X becomes, as a necessary condition for Y. To the limit, when points become perfectly aligned with the main diagonal, X becomes fully relevant not only as a necessary, but also as a sufficient condition for Y. In the crisp 2 \(\times\) 2 table, this is the equivalent of all cases from cell d being moved in cell b, signalling a perfect correlation.

Goertz (2006a, 95) was the first to propose a measure of trivialness, by measuring the distance between the fuzzy value and 1:

\[T_{nec} = \frac{1}{\mbox{N}}\sum\frac{1 - \mbox{X}}{1 - \mbox{Y}}\]

It is a measure of trivialness, but at the same time a measure of relevance (what is not trivial, is relevant). The more this measure moves away from 0, the more relevant the necessary condition is.

Later, C. Schneider and Wagemann (2012) observed that in some cases, this formula can produce values above 1 (which do not make a lot of sense for a fuzzy interpretation), hence they have proposed a modified version which is the default RoN parameter of fit from the output of the pof() function:

\[RoN = \frac{\sum(1 - \mbox{X})}{\sum(1 - min(\mbox{X}, \mbox{Y}))}\]

The distance between 1 to X is divided by the distance between 1 to the intersection of X and Y, and since the intersection is a smaller value (taking the minimum between X and Y), the denominator is always greater or equal to the numerator, hence this parameter will never exceed the value of 1.

To demonstrate this with the necessity relation between LIT and SURV:

using(LF, sum(1 - LIT) / sum(1 - fuzzyand(LIT, SURV)))
[1] 0.5094142

The function XYplot() in package QCA creates an XY plot between a condition and an outcome, and at the same time presenting all parameters of fit. Just like the pof() function, it has an argument called relation which is by default set to sufficiency.

XYplot(LIT, SURV, data = LF, jitter = TRUE, relation = "necessity")
Literacy as a necessary condition for the survival of democracy

Figure 5.12: Literacy as a necessary condition for the survival of democracy

The code above creates an XY plot with slightly jittered points using the argument jitter = TRUE (because some of them are so close that become overlapped). The condition LIT is clearly a necessary condition for SURV, with an inclusion score of 0.991, but its relevance of a bit more than 0.5 is rather modest. This happens because many of the points below the main diagonal have very high values on LIT (close to or equal to 1), therefore they are constantly located close to the right margin of the plot.

The average distance of the points from 1 gets smaller with each point close to the right margin, which explains the semi-trivialness of the literacy as a necessary condition for the survival of democracy.

C. Schneider and Wagemann (2012) point that a low relevance score should be accompanied by studying the deviant cases for necessity. There are two middle, dotted lines on the XY plot (a horizontal and a vertical one) that split the fuzzy area into something which resembles a 2 \(\times\) 2 crosstable.

A perfect relevance of necessity is met when cases are found in cells b (lower left) and c (upper right). For necessity, cases found in cell a (upper left) are called deviant cases consistency in kind, having a too large value on the outcome Y, and a too low value on the condition X.

In the example from figure 5.12, there are no deviant cases consistency in kind, in the upper left part of the plot. C. Q. Schneider and Rohlfing (2013) introduce yet another type of deviant cases consistency, namely in degree: for necessity statements, those where both values (for the condition and for the outcome) are greater than 0.5, but the outcome value is greater than the condition’s value (thus invalidating the subset relation of the outcome in the condition). The cases would be located in cell c (upper right), but above the diagonal.

5.4 Necessity for conjunctions and disjunctions

For complex outcome phenomena, single conditions are rarely necessary. Many times, a condition becomes necessary in combination with another condition, either in conjunction, which is essentially an intersection of two or more sets, or in disjunction which is a union of two or more sets.

Conjunctions are rather easy to interpret: if a conjunction of two sets A and B is necessary for the outcome, it means that both A and B are necessary on their own, since the intersection is part of both A and B, as in the figure 5.13.

Since the outcome Y is located (more or less completely) inside the intersection AB, it is also located within A and within B separately, which means that specifying the atomic expressions A and B is redundant, since their conjunction logically implies both of them individually.

Conjunction AB necessary for Y

Figure 5.13: Conjunction AB necessary for Y

A more interesting situation is when conditions A and B are both individually necessary, but their conjunction is not necessary (i.e. the outcome Y is not sufficiently included in their intersection). Figure 5.14 is somewhat similar to figure 5.4, but extended to conjunctions: the outcome Y is almost (but not completely) included in A, and with the same amount (again not completely) included in B.

A and B individually necessary, but not their conjunction AB

Figure 5.14: A and B individually necessary, but not their conjunction AB

For both A and B, the inclusion of outcome Y is high enough to decide that both A and B are individually necessary. But their intersection is small enough, and Y large enough to get outside the intersection, such as the proportion of Y within AB is not high enough to conclude the conjunction is necessary.

For both of these reasons (incomplete inclusion, or if completely included the conjunction is redundant) it is actually very rare, if ever, to observe a non-redundant necessary conjunction. But in both these situations, it is obvious that Y is completely included in the disjunction (union) of the sets A + B, although for the same arguments this particular disjunction is also redundant because both A and B are individually necessary and the simpler expressions are always preferred.

There are many situations (actually, most frequently) when no sets are individually necessary, and implicitly neither their conjunction(s). But there will always be a disjunction of conditions which is higher than the outcome set Y, because unions form larger and larger areas with every causal condition added to the disjunction. At some point, the union (the disjunction) will be so large that a quasi complete inclusion of outcome Y in that disjunction is bound to happen.

A + B $\leftarrow$ Y: the union of A and B is necessary for Y

Figure 5.15: A + B \(\leftarrow\) Y: the union of A and B is necessary for Y

Figure 5.15 presents such a situation, where the outcome Y is sufficiently outside A, and sufficiently outside B, to render both these conditions as individually not necessary.

But the outcome is sufficiently included in their disjunction A + B, since the union of these sets is a much larger area than each of individual sets, so large that help each other covering Y to a much larger proportion.

Disjunctions are easy to construct, through the logical OR operation presented at section 3.3.3 (taking the maximum of each pair of values), and in R using the function fuzzyor(). But this easiness must be taken with a grain of salt, however, because interpreting these disjunctions is not such a trivial matter as it first might seem.

Theoretical meaning of conjunctions is straightforward: they are intersections of known concepts, and will be discussed in depth in the next chapter at the analysis of sufficiency. A possible hypothesis involving conjunctions is: democratic countries (D) which have strong economic relations (E) don’t go to war against each other (\(\sim\)W). This isn’t a necessity statement (conjunctions are redundant under necessity), but it shows clearly at which conjunction (subset) it refers: countries which are both democratic, AND having strong economic relations with other democratic countries: D\(\cdot\)E.

Disjunctions, on the other hand, need a more careful interpretation, for they usually result in unions referring to a higher order concept that is different from the simple union of the constituent concepts. A disjunction D+E such as: “democracies OR having strong economic relations” is something different form the mere juxtaposition of D and E.

Disjunctions are higher order concepts, sometimes called super-concepts, formed by the union of two or more concepts and their analytic meaning depends on theory and researchers’ expertise in the field. There in no mechanical or “one size fits all” interpretation for all cases. Some disjunctions might be theoretically meaningless, and in various contexts they might not satisfy the criteria for the relevance of necessity.

Care must be taken to avoid automatic super-concepts. The next section shows how to use the function superSubset() to find all possible necessary expressions from a certain data, but those are expressions derived from a data specific environment, and not all of them have theoretical meaning.

Computers are very good at finding everything that meets certain criteria, but it is the job of the researcher to sift through computer’s results and select those which do have some meaning. Failing to do so is similar to fishing for data: one can derive many conclusions based on a certain dataset, while some other data might give more or less different conclusions.

It is not the data which should be leading the research process, the correct approach is to test whether certain theoretical expectations are met in a particular dataset.

5.5 Exploring possible necessity relations

The normal analysis of necessity involves specific tests for each causal condition of interest, as well as each possible larger (but theoretically meaningful) superset disjunction. This can be time consuming, and sometimes we might want to simply explore the data for possible necessity relations.

The package QCA offers a useful function called superSubset(), which does all this works automatically. It explores every possible necessity relation, for individual conditions, or conjunctions (even though conjunctions are redundant), as well as all possible disjunctions of conditions that are necessary for a given outcome.

When the number of causal conditions is large, it can be very helpful to get an overview of all possible necessity relations, eliminating useless tests on expressions that are not necessary for the outcome.

Using the same fuzzy version of the Lipset data, the command can be used as simply as:

superSubset(LF, outcome = SURV, incl.cut = 0.9)

                      inclN   RoN   covN  
 1  LIT               0.991  0.509  0.643 
 2  STB               0.920  0.680  0.707 
 3  LIT*STB           0.915  0.800  0.793 
 4  DEV + ~URB        0.964  0.183  0.506 
 5  DEV + ~IND        0.964  0.221  0.518 
 6  DEV + ~STB        0.912  0.447  0.579 
 7  ~URB + IND        0.989  0.157  0.511 
 8  DEV + URB + ~LIT  0.924  0.414  0.570 
 9  DEV + URB + IND   0.903  0.704  0.716 
10  DEV + ~LIT + IND  0.919  0.417  0.569 

The analysis is performed for the default value of the argument relation = "necessity" (which does not need to be formally specified, since it is default), and a similar analysis can be performed for the sufficienty relation, changing it to "sufficiency", or even for relations which are both necessary and sufficient, via "necsuf", or even "sufnec" if an analysis of sufficiency is used first.

It is extremely important to make it clear the list of necessary expressions is automatically generated in a mechanical way, the underlying algorithm merely crunching numbers in a process of indiscriminate search for all possible necessity relations. It is the researcher who should make sense out of all these disjunctions, and select only those which have a theoretical meaning.

For the moment, it is sufficient to notice that two atomic conditions were found as individually necessary for the outcome, as well as their conjunction, and the disjunctions presented in the output are non-redundant (there are more necessary disjunctions, but redundant as subsets of the ones found above).

Another important aspect to observe is that coverage is relatively low for all expressions, and for some expressions their relevance is even closer to zero. For such situations, one possibility is to specify a cut-off for the relevance of necessity, for example list only those expressions with a decent relevance threshold of at least 0.6:

superSubset(LF, outcome = SURV, incl.cut = 0.9, ron.cut = 0.6)

                    inclN   RoN   covN  
1  STB              0.920  0.680  0.707 
2  LIT*STB          0.915  0.800  0.793 
3  DEV + URB + IND  0.903  0.704  0.716 

This is an interesting example that is worth discussing, for several things are happening and all are important. To begin with, it can be noticed that the conjunction LIT\(\cdot\)STB is necessary, but the atomic condition LIT is not part of this list of relevant necessary expressions.

Although a bit counter-intuitive, it does makes logical sense. If a conjunction is necessary, this logically implies the atomic conditions are also necessary. And that is absolutely true, since the inclusion score for the atomic condition LIT is 0.991, which is a very high inclusion score that without any doubt renders LIT as a necessary condition.

But necessary conditions are not always relevant. Air is a necessary condition for a battle, but it is also completely irrelevant for the set of conditions that lead to a battle. This is a similar situation, where LIT (literacy) is a necessary but irrelevant condition for the outcome SURV (survival of democracy) in the inter-war period.

This means the condition LIT is a high enough set, and the outcome SURV is small enough to fit inside the intersection with STB (government stability), and the intersection itself is small enough to cover SURV to such an extent that the intersection is relevant and the atomic condition LIT is not.

Depending on how large the outcome set is, and where it is located within a necessary causal condition, sometimes the atomic condition is necessary but a conjunction with another condition is not necessary. Some other times, as in this example, the conjunction is necessary and relevant and the atomic condition is also necessary but not relevant.

This is why, in the results from the function superSubset(), there might be both atomic conditions and their conjunctions listed as necessary conditions but granted, when the conjunction itself is both necessary and relevant, all superset conditions are redundant. In such a situation both LIT and STB are implied by the conjunction LIT\(\cdot\)STB, and they could as well be removed from the list of necessary expressions.

The opposite situation happens for the sufficiency statements (to be discussed in the next chapter), where if an atomic condition is sufficient for an outcome, any of its subset conjunctions are redundant because they are logically sufficient as part of the larger, atomic condition. In necessity statements, the reverse happens that necessary conjunctions make the atomic conditions redundant.

A third and perhaps most important thing that happened is that most of the disjunctions disappeared from the resulting list of relevant necessary expressions. Although necessary, some of them are highly irrelevant, for example the disjunctive expression urb+IND with a very high inclusion score of 0.989 but a very low relevance score of 0.157.

This is a perfect example of illogical disjunction, for it is very difficult to make any logical and theoretical sense for the union between low urbanisation (urb) and high industrialisation (IND), and their relationship as a necessary expression for the survival of democracy. Similar illogical unions are those between the high level of development (DEV) and low level of urbanisation (urb) or with a low level of industrialisation (ind).

These are all textbook examples of disjunctive necessary expressions that are meaningless constructs for the outcome of interest. Superset constructs should not be automatically considered relevant, just because they have a high inclusion score under a necessity statement.

Such a misjudgement was made by Thiem (2016), using precisely this Lipset dataset in the attempt to demonstrate that necessary expressions should not be used when identifying incoherent counterfactuals for the ESA - Enhanced Standard Analysis (see sections 8.6 and 8.7), because they lead to an alleged effect named CONSOL which states that removing such counterfactuals would generate the conservative solution instead of the enhanced intermediate one.

In order to generate the conservative solution, most if not all remainders should be blocked from the minimization process, therefore he employed the entire list of necessary expressions resulting from the function superSubset() irrespective of their relevance or their theoretical and even logical meaning.

This demonstrates a lack of understanding of how disjunctive necessary expressions should be interpreted and employed in further analyses. Ignoring the already existing standards of good practice, as well as abusing the list of mechanically generated necessary expressions, fails to realise that a methodology which does not serve a theoretical purpose is just a meaningless display of skills.


Braumoeller, Bear, and Gary Goertz. 2000. “The Methodology of Necessary Conditions.” American Journal of Political Science 44 (4): 844–58.
Goertz, Gary. 2003. “Cause, Correlation and Necessary Conditions.” In Necessary Conditions: Theory, Methodology, and Applications, edited by Gary Goertz and Harvey Starr, 47–64. Lanham, Md., Boulder: Rowman & Littlefield.
———. 2006a. “Assessing the Trivialness, Relevance, and Relative Importance of Necessary or Sufficient Conditions in Social Science.” Studies in Comparative International Development 41 (2): 88–109.
Schneider, Carsten Q., and Ingo Rohlfing. 2013. Combining QCA and Process Tracing in Set-Theoretic Multi-Method Research.” Sociological Methods and Research 42 (4): 559–97.
Schneider, Carsten, and Claudius Wagemann. 2012. Set-Theoretic Methods for the Social Sciences. A Guide to Qualitative Comparative Analysis. Cambridge: Cambridge University Press.
Skocpol, Theda. 1979. States and Social Revolutions. A Comparative Analysis of France, Russia, and China. Cambridge: Cambridge University Press.
———. 2016. Standards of Good Practice and the Methodology of Necessary Conditions in Qualitative Comparative Analysis.” Political Analysis 24 (4): 478–84.

  1. The null hypothesis that true odds ratio is equal to 1 is rejected despite the fact the confidence interval contains the value of 1↩︎