Chapter 4 Calibration

The usual QCA data is numeric, and has specific formats for each flavour: when crisp (either binary or multi-value) the data consists of integers starting from the value of 0, and when fuzzy the values span over a continuous range, anywhere between 0 and 1.

Not all data from scientific research conform to these formats. In fact, one can actually expect the contrary, that most of the times the raw data has a different shape than expected to perform QCA. There are numeric variables of all types and most importantly of all ranges, and there are qualitative variables that separate cases into categories etc.

Calibration is a fundamental operation in Qualitative Comparative Analysis. It is a transformational process from the raw numerical data to set membership scores, based on a certain number of qualitative anchors or thresholds. The process is far from a mechanical transformation, because the choice of the calibration thresholds is a theoretically informed one and dramatically changes the result of the calibration process.

Although this book is more about the practical details on how to perform QCA specific operations with R, it is still important to cover (at least in a brief way) the theoretical concepts underpinning these operations.

Typically, social science research data are separated into four levels of measurement: nominal, ordinal, interval and ratio. Virtually all of them can be calibrated to binary crisp sets, almost but not all can be calibrated to multi-value crisp sets, while in the case of fuzzy sets only the “interval” and “ratio” levels of measurement can be used. However, the concept of “calibration” is something different from the concept of “measurement” (Ragin 2008a, 2008b).

When thinking about “measurement”, the social science researcher arranges the values of a variable in a certain order (where some of the values are smaller / higher than others), or calculate their standardized score by comparing all of them with the average of the variable: some will have positive scores (with values above the average), and some will have negative scores (values below the average).

While this approach makes perfect sense from a mathematical and statistical point of view, it tells almost nothing in terms of set theory. In the classical example of temperature, values can be arranged in ascending order, and any computer will tell which of the values are higher than others. But no computer in the world would be able to tell which of the temperature values are “hot”, and which are “cold”, only by analysing those values.

Mathematics and statistics usually work with a sample, and the derived measurement values are sample specific. The interpretation of the values, however, need more information which is not found in the sample, but in the body of theory: “cold” means when the temperature approaches zero degrees Celsius (the point where the water transforms into ice) and “hot” means when the temperature approaches 100 degrees Celsius (the point where the water transforms into vapours). This kind of information is not found in the sample, but outside.

The same can be said about any other numerical variable. Considering people’s heights, for instance, computers can tell which which person has a higher height the other(s), but it could not tell what it means to be “tall” and what it means to be “short”. These are human interpreted, abstract concepts which are are qualitative in nature, not quantitative, and most importantly they are culturally dependent (a tall person in China means a very different thing than a tall person in Sweden).

In the absence of these humanly assessed qualitative anchors, it is impossible for a computer (or a researcher with no other prior information) to derive qualitative conclusions about the values in the sample.

The first section in this chapter is dedicated to crisp calibrations, showing that it is actually improper to use the term “calibration” for crisp sets, since they involve a simple recoding of the raw data instead of a seemingly complicated calibration process.

The second section is devoted to the proper fuzzy calibration, including various types of “direct” calibration and also the “indirect” calibration. Multiple methods are available for the direct method of calibration, depending of the definition of the concept to be calibrated, and package QCA is well equipped with a complete set of tools to deal with each such situation. The indirect method of calibration is very often misinterpreted with a direct assignment of fuzzy scores, but it is much more than that.

The final section is dedicated to calibrating categorical data, as an attempt to clarify some aspects regarding the meaning of the word calibration and how the final outcome of the calibration process depend heavily on the type of input data.

4.1 Calibrating to crisp sets

When using only one threshold, the procedure produces the so called “binary” crisp sets dividing the cases into two groups, and if using two or more thresholds the procedure produces the “multi-value” crisp sets. In all situations, the number of groups being obtained is equal to the number of thresholds plus 1.

All previously given examples (temperature, height) imply numerical variables, which means the concept of calibration is most often associated with fuzzy sets. That is a correct observation, because the process of calibration to crisp sets is essentially a process of data recoding. As the final result is “crisp”, for binary crisp sets the goal is to recode all values below a certain threshold to 0, and all values above that threshold to 1 (for multi-value crisp sets, it is possible to add new values for ranges between two consecutive thresholds).

As in all other sections and chapters, wherever there are options for both command line and graphical user interface, the demonstration starts with the command line and the user interface will follow in close synchronization.

To exemplify this type of calibration, a well known data from the Lipset (1959) study is going to be loaded into the working space via this command:


There are four versions of the Lipset dataset included in the QCA package: LR (the raw data), LC (calibrated to binary crisp sets), LM (calibrated to multi-value crisp sets) and LF (calibrated to fuzzy sets). The description of all columns, including the outcome, can be found via the command ?LR.

This example concentrates in the column DEV, referring to the level of development as GNP per capita, measured in US dollars (values for 1930), which spans from a minimum value of 320 and a maximum value or 1098:

 [1]  320  331  350  367  390  424  468  517  586  590  662  720  795
[14]  897  983 1008 1038 1098

Before determining what an appropriate threshold is, to separate these values into two groups (0 as not developed, and 1 as developed to create a binary crisp set), it is always a good idea to graphically inspect the distribution of points on a horizontal axis.

The graphical user interface has an embedded threshold setter in the dialog for the calibration menu, and there are various ways to create a similar plot via command line, using for example the function plot(), but the simplest would be to use the dedicated function Xplot() that inspects just one variable, with a similar looking result as the threshold setter area in the user interface:

Xplot(LR$DEV, at = pretty(LR$DEV), cex = 0.8)
The distribution of DEV values

Figure 4.1: The distribution of DEV values

This particular plot has sufficiently few points that don’t overlap much, but if there are many overlapping points the function has an argument called jitter which can be activated via jitter = TRUE.

In the absence of any theoretical information about the what “development” means (more exactly, what determines “high” or “low” development), one approach is to inspect the plot and determine if the points are grouped in natural occurring clusters. It is not the case with this distribution, therefore users can either resort to finding a threshold using a statistical clustering technique, or search a relevant theory.

In the QCA package there is a function called findTh() which employs a cluster analysis to establish which threshold values best separates the points into a certain number of groups. To separate into two groups, as explained in section 2.3.2, no other additional parameters are needed because the number of thresholds (argument n) is by default set equal to 1. The command is:

[1] 626

The value of 626 was found by a complete hierarchical clustering, using the euclidean distance (see the default values of arguments hclustm and distm). However, Rihoux and De Meur (2009) have decided to use a close but different threshold value of 550 USD for their binary crisp calibration. Initially, they have used a threshold of 600 USD but upon closer inspection during their analysis, they have found that a value of 550 USD accounts for a natural ‘gap’ in the distribution of values and better “differentiates between Finland (590 USD) and Estonia (468 USD)”.

If not for the threshold value of 550 USD, Finland and Estonia would fall in the same category of development, and there is clearly a difference between the two neighbouring countries. This is a fine example of how theory and practical experience are used as a guide to establish the best separating line(s) which define the crisp groupings of cases. This is a qualitative assessment originating from outside the dataset: it is not something derived from the raw data values, but from external sources of knowledge, and makes a perfect example of the difference between “calibration” and raw data “measurement”.

The final, calibrated values can be obtained with two different methods. The first is to use the calibrate() function and chose the type = "crisp" argument (the default is "fuzzy"):

calibrate(LR$DEV, type = "crisp", thresholds = 550)
 [1] 1 1 1 0 1 1 1 0 0 1 0 1 0 0 0 0 1 1

There are other arguments in the function calibrate(), but all of those refer to the fuzzy type calibration. In the crisp version, these two arguments are the only necessary ones to obtain a calibrated condition.

The crisp version of calibration is essentially equivalent to recoding the original raw data to a finite (and usually very low) number of crisp scores. Therefore a second method, which will give exactly the same results, is to use the recode() function from package QCA, using the following command:

recode(LR$DEV, rules = "lo:550 = 0; else = 1")
 [1] 1 1 1 0 1 1 1 0 0 1 0 1 0 0 0 0 1 1

The syntax of the recode() function is very simple, having only two formal arguments: x and rules, where the first is the initial raw vector of data to be recoded, while the second is a string determining the recoding rules to be used. In this example, it can be translated as: all values between the lowest lo and 550 (inclusive) should be recoded to 0, and everything else should be recoded to 1.

Calibrating to multi-value crisp sets is just as simple, the only difference being the number of thresholds n that divide the cases into n + 1 groups:

findTh(LR$DEV, n = 2)
[1] 626 940

The clustering method finds 626 and 940 as the two thresholds, while Cronqvist and Berg-Schlosser (2009) used the values of 550 and 850 USD to derive a multi-value causal conditions with three values: 0, 1 and 2:

calibrate(LR$DEV, type = "crisp", thresholds = "550, 850")
 [1] 1 2 1 0 1 2 1 0 0 1 0 2 0 0 0 0 2 2

The argument thresholds can in fact be specified as a numerical vector such as c(550, 850), but as it will be shown in the next section, when calibrating to fuzzy sets this argument is best specified as a named vector and its most simple form is written between two double quotes. This is an improvement over the former specification of this argument, but both are accepted for backwards compatibility.

Using the recode() function gives the same results:

recode(LR$DEV, rules = "lo:550 = 0; 551:850 = 1; else = 2")
 [1] 1 2 1 0 1 2 1 0 0 1 0 2 0 0 0 0 2 2

This specification of the argument rules assumes the raw data are discrete integers, but in fact the recode() function has another specification inspired by the function cut(), which works with both discrete and continous data. This other method uses a different argument named cuts (similar to the argument breaks from function cut() and also similar to the function thresholds from function calibrate(), to define the cut points where the original values will be recoded) and a related argument named values to specify the output values:

recode(LR$DEV, cuts = "550, 850", values = 0:2)
 [1] 1 2 1 0 1 2 1 0 0 1 0 2 0 0 0 0 2 2
0 1 2 
0 1 2 

As mentioned, there are various dialogs in the graphical user interface to match these commands. The calibration dialog is one of the most complex from the entire user interface, and figure 4.2 shows the one appearing after selecting the menu:

Data / Calibrate:

The crisp "Calibrate" dialog

Figure 4.2: The crisp “Calibrate” dialog

The procedure to use this dialog is very straightforward and involves a number of intuitive steps:

  1. select the dataset from the list in the Dataset area: in this example a single dataset is loaded, but R can work with any number of datasets in the same time (one thing to notice, if there is a single dataset loaded in R, it will automatically be selected by the interface)
  2. select the condition from the list under the Choose condition area, with the immediate effect of the distribution of values appearing in the threshold setter area
  3. choose crisp from the radio button pairing with fuzzy (this is the equivalent of the argument type in the written command)
  4. if threshold values are to be suggested by the computer, check the find thresholds checkbox; it has no direct equivalent with the arguments of the calibrate() function, but it is using the findTh() function behind the scenes
  5. if points are too densely clustered, check on the jitter points checkbox to scatter the points vertically with small random values
  6. adjust the number of thresholds via the down or up buttons
  7. whether or not asking the computer for thresholds, their values can be manually (over)written in the text boxes right above the plot area
  8. independently of manual or automatic specification of the thresholds values, their correspondent vertical red bars in the plot area can be manually dragged left or right, and the text boxes from step 7 will change accordingly
  9. if the calibrated values should be saved as a different variable in the same dataset, check the calibrate into new condition and specify the new (calibrated) condition name (otherwise it will overwrite the same condition with the new values)
  10. click the Run button and the new column should appear in the dataset, visible either in the console or in the data editor.

Figure 4.2 presents a situation where the condition DEV from the dataset LR is calibrated to multi-value crisp sets using two thresholds (550 and 850) which are manually set (the “find thresholds” option is unchecked), with points jittered vertically to avoid overlapping.

Since the user interface is developed into a webpage, it makes sense to use all the advantages of this environment. The points have a “mouse-over” property, and respond with the label of the point (the row name of that particular case), in this example displaying EE (Estonia), a country from the dataset LR.

The dialog allows up to six thresholds for the crisp type, dividing a causal condition in at most seven groups. This is a limitation due to the lack of space in the dialog, but otherwise the command line can specify any number of thresholds. Cronqvist and Berg-Schlosser (2009) have given a fair number of practical advice in order to decide for how many thresholds should be set. Apart from the already mentioned guides (to look for naturally occurring clusters of points, and employing theoretically based decisions), one other very good advice is to avoid creating large unbalances in the group sizes, otherwise solutions will possibly be too case specific (finding solutions that explain exactly 1 or 2 cases, whereas a scientifically acceptable result should allow more general solutions, at least to some degree).

This particular distribution of data points are rather clearly scattered, but other datasets can have hundreds of overlapping points, a situation when the area dedicated for the thresholds setter will prove to be too small even if points are jittered. However, this is not a problem for an interface designed into a webpage: unlike traditional user interfaces where dialogs are fixed, this particular interface is designed to be responsive, reactive and above all interactive.

Notice the small handling sign in the bottom right corner of the dialog (all resizable dialogs have it), which can be used to enlarge the dialog’s width to virtually any dimension until the points will become more clearly scattered for the human eye. The result of such dialog enlargement can be seen in figure 4.3 below:

The resized "Calibrate" dialog

Figure 4.3: The resized “Calibrate” dialog

Many dialogs are allowed to be resized (for example the plot window), and the content inside is automatically recalculated to the new dialog dimensions. In this particular example, only the threshold setter area was redrawn and the bottom controls (including the Run button) have been repositioned. All other buttons and controls have been left to their original position.

As shown in section 2.4, each click and every action like dragging thresholds left or right, triggers a modification of the “Command constructor” dialog. From the second step where the condition is selected, the command constructor starts to display the equivalent written command which, upon the click of the Run button, will be sent to the R console. There are so many features in the graphical user interface that a thorough description of every single one of them would require too much book space and distort the user’s attention from the important topics. To mention just one such “hidden” feature, when multiple thresholds are present in the thresholds setter area, they are automatically sorted and displayed in increasing order, with a limited bounded drag range, between the minimum and maximum of the values found in the data.

The “Command constructor” dialog is refreshed on every click, with the aim to help the user construct the command itself, rather than clicking through the dialogs. This will pay off since the written commands are always better than a point-and-click approach. While users can easily forget what did they click to obtain a particular result, commands saved in dedicated script files are going to be available at anytime and this is helpful for replication purposes: it is more complicated to replicate clicks than to run a script file.

Since there are two functions that accomplish the same goal of calibrating to crisp sets, there are also two separate dialogs. The next to be presented refers to a menu which is not so much related to QCA per se but it accomplishes a more general data transformation process which is present in almost any other software:

Data / Recode

The "Recode" dialog

Figure 4.4: The “Recode” dialog

The design of this dialog is largely inspired from the recoding menu of the SPSS software, to which many of the social science users are very accustomed with. Figure 4.4 presents the same areas to select for Dataset and Choose condition: as in the calibration dialog, but otherwise it has the familiar “Old values” and “New values” sections which are found in the other software.

This dialog also has a rather straightforward procedure to use, with the same first two steps as in the calibrate dialog (to select the dataset and the condition), therefore will continue from the third step:

  1. select the radio button or the relevant text box(es) and insert the threshold(s), in the Old value(s) part of the dialog
  2. insert the new value or select from the other options in the New value side of the dialog
  3. press the Add button to construct the rule
  4. repeate steps 3 to 5 for each recoding rule
  5. if the recoded values should be saved as a different variable in the same dataset, check the recode into new condition and specify the new (recoded) condition name (otherwise it will overwrite the same condition with the new values)
  6. click the Run button and the new column should appear in the dataset, visible either in the console or in the data editor.

There are two additional buttons on the right side of the dialog: Remove erases any selected rule(s) from the recoding rules area, and Clear erases all rules at once. As any rule of the rules is selected, their correspondent radios and text boxes are completed with the rule values, both in the old and new parts. This allows modifications to any of the rules, and a second press on the Add button brings those modifications in the rules area.

As long as the recoding rules do not overlap (an ‘old’ value should be covered by only one of the recodings), the order of the rules doesn’t matter. But if many recoding rules cover the same old values, then precedence has the last specified rule (which overwrites recodings made by the first specified rules). As always, a few toy examples in the command line with only a handful of values will show the user how the command works for every scenario.

This section ends with the conclusion that calibration to crisp sets is essentially equivalent to recoding the initial raw causal condition with a set of discrete new values. The next section demonstrates what the “real” calibration is all about, applied to fuzzy sets. It is the main reason why the default value of the type argument has been changed to "fuzzy", despite its long lasting traditional default value of "crisp" in all previous versions of the QCA package.

4.2 Calibrating to fuzzy sets

Social science concepts are inherently difficult to measure. Unlike physical sciences where things can be directly observed and measured, in the social sciences things are not directly observable, hence their measurement is always challenging and often problematic. Some of the concepts from the social world are more easily observable: sex, age, race etc., but the bulk of the social science concepts are highly abstract and need substantially more effort to have them measured, or at least to create an attempt of measurement model. These are complex, multi-dimensional concepts which require the use of yet another (set of) concepts just to obtain a definition, and those concepts need a definition of their on etc.

Neuman (2003) presents an in-depth discussion about the role of concepts in social research, for both quantitative (positivist) and qualitative (interpretive) approaches. The measurement process is different: quantitative research define concepts before data collection and produces numerical, empirical information about the concepts, while in the qualitative research concepts can be produced during the data collection process itself.

Both approaches use conceptualization and operationalization in the measurement process, in a tight connection with the concept definition, although Goertz (2006b) has an ontological perspective arguing there is more about concepts than a mere definition, because researchers need to first establish what is “important” about the entity in order to arrive at a proper definition.

Concepts have a tight interconnection with theory: sometimes the concept formation process leads to new theories, while established theories always use accepted definitions of their concepts. These definitions can change depending on the theoretical perspective employed in the research process. Although not directly concerning QCA, it is a nevertheless important discussion for calibration purposes.

From yet another point of view, concepts have a cultural background just as much they have a theoretical one. This cultural dependence can happen in at least two different ways:

  1. concepts have different meanings in different cultures: altruism in Hungary is probably something different from the altruism in Korea (to compare very different countries), or the well known continuum for left and right political positioning doesn’t have the same meaning in Japan, where political participation resembles very little if nothing at all with the Western concept of “participation”
  2. even if concepts have the same meaning, their level of concentration can dramatically differ in different cultural and/or historical contexts, for example citizenship or public participation which has very high levels in a country like Belgium, and very low levels in a post-communist country like Romania.

4.2.1 Direct assignment

The method of direct assignment is the simplest possible way to obtain a (seemingly) fuzzy calibrated condition from some raw numerical data. The term “direct assignment” has been introduced by Verkuilen (2005), while something similar was briefly mentioned by Ragin (2000).

It is likely a method that is tributary to Verkuilen’s formal training in experimental psychology, where expert knowledge is both studied and employed in conjunction with various scales. In the direct assignment, the fuzzy scores are allocated by experts the way they seem fit, according to their expertise. There can be some form of theoretical justification for the various thresholds separating the fuzzy scores, but in the end this is a highly subjective method and it is likely that no two experts will reach exactly the same values.

To avoid the point of maximum ambiguity 0.5, the experts typically choose four, and sometimes even six fuzzy scores to transform the raw data into a fuzzy set. This procedure is extremely similar to the recoding operation when calibrating to crisp sets, with the only exception that the final values are not crisp, but fuzzy between 0 and 1.

To exemplify, we can recode the same condition DEV from the raw version of the Lipset data:

recode(LR$DEV, cuts = "350, 550, 850", values = "0, 0.33, 0.66, 1")
 [1] 0.66 1.00 0.66 0.33 0.66 1.00 0.66 0.33 0.33 0.66 0.33 1.00 0.00
[14] 0.00 0.00 0.33 1.00 1.00
   0 0.33 0.66    1 
0.00 0.33 0.66 1.00 

All values between 0 and 350 are recoded to 0, the ones between 351 and 550 to 0.33, the ones between 551 and 850 to 0.66 and the rest are recoded to 1. Supposing the thresholds (in this case, the cuts) have some theoretical meaning, this is a very simple and rudimentary way to obtain a seemingly fuzzy calibrated condition.

Arguably, the end result is by no means different from a calibration to crisp sets, obtaining a new condition with four levels:

recode(LR$DEV, cuts = "350, 550, 850", values = "0, 1, 2, 3")
 [1] 2 3 2 1 2 3 2 1 1 2 1 3 0 0 0 1 3 3
0 1 2 3 
0 1 2 3 

Naturally, more levels and generally more multi-value conditions in a dataset expand the analysis with even more possible causal configurations, and from this point of view a fuzzy set (even one having four fuzzy categories) is preferable because it is at least confined between 0 and 1. But while fuzzy sets are notoriously averse against the middle point 0.5, the crisp sets are more than willing to accommodate it in a middle level, for instance creating a multi-value crisp set with three levels:

recode(LR$DEV, cuts = "500, 850", values = "0, 1, 2")
 [1] 1 2 1 0 1 2 1 0 0 1 1 2 0 0 0 0 2 2
0 1 2 
0 1 2 

Here, the crisp value of 1 would correspond to a fuzzy membership value of 0.5 and the crisp value of 2 would correspond to a full fuzzy membership value of 1. As it will later be shown, especially in chapter 7, such a thing is forbidden with fuzzy sets.

Due to its sensitivity to individual expertise, it is difficult to conceptualize the result of a direct assessment as a proper fuzzy set. One year later, Smithson and Verkuilen (2006) don’t mention this method at all in their book, and another two years after that Ragin (2008b) doesn’t even discuss anything related to the direct assignment and presents exclusively what he calls the “direct method” and the “indirect method” of calibration, which are going to be presented in the next sections.

Very likely, many users might confuse the direct assignment with the direct method, and believe that manually assigning (seemingly) fuzzy scores to some properties is not only an acceptable method, but it is recommended by Ragin. This is undoubtedly far from the actual situation, and the two should not be mistaken.

Despite being presented in this book in a dedicated section, the main recommendation is not to use it, if at all possible. There are far better alternatives, for any possible scenario. Some argue that such a method is the only one possible when transforming Likert type response scales (therefore ordinal measured data) to fuzzy values, and indeed the direct and the indirect methods would not be appropriate in such situations because they both need numerical data from at least an interval level of measurement. But the final section 4.3 shows one possible way to deal with such situations.

When presenting the direct assignment, Verkuilen himself (2005, 471) mentions no less than five main problems associated with this method, and only for this method (for all the other, transformational assignments, no such problems are mentioned):

  1. The interpretation of the generated set membership values is very difficult
  2. The direct assignment is mostly unreliable, especially for very abstract concepts
  3. It contains bias (the expert’s own expertise)
  4. There is no error (uncertainty) associated with the generated set membership values
  5. Combining the expertise from multiple judges is also difficult

Having presented all this information, the main advice is to refrain from using the direct assignment. The next sections will introduce the standard calibration techniques as they are used in the current QCA practice.

4.2.2 Direct method, the “s-shape” functions

This above argument holds for complex multidimensional concepts, but interestingly it also holds for very simple concepts, like age or height. In a Western culture, suppose we have an average height of 1.75 meters (roughly 5 ft 9 in), with a range of heights between a minimum of about 1.5 meters (an inch below 5 feet) and a maximum of about 2 meters (about 6 ft 7 in).

We can simulate a sample of 100 such height values (in centimeters) via these commands:

height <- rnorm(n = 100, mean = 175, sd = 10)
[1] 151.1964 199.7711

This is a normally distributed, random sample of heights where the “tallest” person has 1.99 meters and the “shortest” one has 1.51 meters. It doesn’t display any large separation for clusters of points, as figure 4.5 shows, perhaps with the exception of the two values on the left side (but since this is randomly generated data it has no theoretical interpretation). In this example, there are 100 values, and most of them overlap in the middle part of the plot. To have a more accurate idea of what this distribution looks like, the argument jitter was activated to add a bit of “noise” on the vertical axis.

Xplot(height, jitter = TRUE, cex = 0.8)
The distribution of generated heights

Figure 4.5: The distribution of generated heights

Now suppose an accepted definition of a “tall” person, for the Western cultures, is anyone having at least 1.85 meters (at least 6 feet) so anyone above this threshold would be considered “tall”. Conversely, anyone below 1.65 meters (5 ft 5 in) would be considered “short”. If these anchors hold, it is now possible to calibrate the concept of height and transform its raw values into membership scores for the set of “tall people” using this command:

# increasing calibration of height
ich <- calibrate(height, thresholds = "e=165, c=175, i=185")

It is the same function calibrate(), with two important differences from the previous section:

  1. the argument type needs no special specification (its default value is already "fuzzy"), to signal we are going to obtain a continuous set of membership values between 0 and 1
  2. the argument thresholds has a different structure, specifying three letters (“e” stands for complete exclusion, “c” is the crossover point, and “i” is the threshold for complete inclusion in this set), and three associated numbers which are the actual threshold values.

As previously mentioned, although this command looks deceptively simple, there are other arguments “at work” with their default values associated with the fuzzy calibration. Generating set membership values has to use a mathematical function, in this case the default being the logistic distribution due to the argument logistic = TRUE. Since the logistic distribution has a well known shape which resembles the letter “S”, package QCA names the resulting values an “s-shape” (to differentiate from “bell-shape” which will be introduced later in this section).

There is more to tell about these inter-related arguments, because not all of them are active all the time. They get activated in a logical sequence, once the first link of the sequence has been activated. In this example, the first link is the argument type which has been left to the default fuzzy. Only when this happens, the argument logistic starts to work (for the crisp version it didn’t have any effect whatsoever), and a third link in the sequence is the argument idm which depends on both type and logistic, and becomes active only when the type is fuzzy and logistic is TRUE. If any of the first two has a different value, the argument idm will stop producing any effect (see section 4.2.3 for details of what this argument is and how it works).

The function is smart enough to detect all these logical sequences of arguments, but on the other hand the user needs to properly understand how they work in order to effectively make use of the function.

And this is not the end of the story, for there is even more to tell about the specification of the thresholds argument:

  1. Whenever the function starts with exclusion and ends with inclusion thresholds, the function will increase from left to right. That means it is logically possible to specify a decreasing function, if the specification is reversed: when it starts with inclusion (for the low raw values on the left) and ends with exclusion (for the large raw values on the right), the function will be decreasing from left to right.
  2. If thresholds contains exactly 3 values, it signals the use of the “s-shaped” functions, and if it uses exactly 6 values, it signals the use of the “bell-shaped” function (see section 4.2.4). For fuzzy calibration, any other number of thresholds will generate an error.

Therefore we can create the set membership in the set of “short” people by simply changing the order of the thresholds from left to right, noting the thresholds’ values are always increasing from left to right, as we move from the smaller raw values on the left of the distribution towards the larger raw values on the right side of the distribution:

# decreasing calibration of height
dch <- calibrate(height, thresholds = "i=165, c=175, e=185")

Figure 4.6 presents two plots of raw versus their correspondent calibrated values, in the distinctive, increasing s-shape fuzzy set on plot a) and decreasing s-shape fuzzy set on plot b).

The horizontal axes correspond to the raw data (ranging from 1.5 to 2.0 meters in height), and the vertical axes have membership scores in the set of “tall” and “short” people, ranging from 0 to 1. Each initial value has been transformed into a membership degree, but unlike the crisp version(s) where the raw data got recoded to small number of new values, in this case each value has its own degree of membership, along the logistic function starting from the small raw values on the left and progressively going towards the large raw values on the right.

par(mfrow = c(1, 2))
plot(height, ich, main = "a. Set of tall people", xlab = "Raw data",
     ylab = "Calibrated data")
plot(height, dch, main = "b. Set of short people", xlab = "Raw data",
     ylab = "")
Calibration using the logistic function

Figure 4.6: Calibration using the logistic function

In figure 4.6.a, tall people should have a higher inclusion in the set than short people, therefore the function increases from 0 in the lower left side to 1 in the upper right side, such that a person standing 1.5m in height will have a 0 inclusion score in (or completely out of) the set of “tall” people.

The opposite situation can be seen in the plot 4.6.b, where it seems natural for tall people to have low inclusion scores in the set of “short” people, therefore the shape of the function decreases from 1 in the upper left side to 0 in the lower right side (an inverted s-shape function), such that a person standing 2m tall would have a 0 inclusion score in (or completely out of) the set of “short” people.

Before going too deep into the specificities of how the logistic function is used to obtain these membership scores, let us return to the original discussion about the differences between cultures. The exclusion, crossover and inclusion thresholds of 1.65, 1.75 and 1.85 are valid for the Western context, but would not produce the entire range of values between 0 and 1 for an Eastern context.

Suppose there is a hypothesis relating height to labor market outcomes (Persico, Postlewaite, and Silverman 2004), or to the general wellness and quality of life (Bolton-Smith et al. 2000). It would not make any sense to use the same threshold values for all countries in a comparative study. The average height in a country like Korea is much lower than the average in a Western country like France (not to mention Sweden), therefore using the same thresholds would cluster most of the calibrated membership scores to the lower left part of the plot.

The goal of the calibration process is to obtain membership values for the full range from 0 to 1, for each and every compared country, according to the meaning of the concept for each particular country of culture. “Tall” has a very different meaning for tribes in Guatemala (Bogin 1998), where the average height in the 1970s was reported for the Mayan men to about 1.575 meters (5ft 2in). In order to obtain the full range between 0 and 1, is it mandatory to use a different set of thresholds values for exclusion, crossover and inclusion in the set.

As each country has a different set of thresholds, establishing the exact values for each set doesn’t have anything to do with mathematics or statistics. Rather, is has everything to do with the researcher’s familiarity with each studied country, which is a qualitative, rather than quantitative approach to research. The researcher has to know and to understand the “meaning” of each concept for each particular country, and this cannot be mathematically derived from separate random samplings from every studied country.

In this example the individual cases represent people’s heights, but the same line of thought can be applied in the situation when cases represent whole countries, and the dataset is a collection of values for each individual country. Ragin’s calibrated values (rounded to 2 decimals) can be obtained easily via:

inc <- c(40110, 34400, 25200, 24920, 20060, 17090, 15320, 13680, 11720,
         11290, 10940, 9800, 7470, 4670, 4100, 4070, 3740, 3690, 3590,
         2980, 1000, 650, 450, 110)
incal <- round(calibrate(inc, thresholds = c(2500, 5000, 20000)), 2)

The calibrated values have been rounded to two decimals for an easier comparison with Ragin (2008b, 89) table and also with the fs/QCA software which also rounds to two decimals by default. But a good recommendation, for the purposes of QCA minimization, is to leave all decimals of the calibrated values intact.

Table 4.1 displays the raw values and their calibrated counterparts, side by side for each individual country. There are very minor differences from Ragin’s values (for example Israel), explained in section 4.2.3.

Table 4.1: Per capita income (INC) calibrated (INCAL).
Switzerland 40110 1.00
United States 34400 1.00
Netherlands 25200 0.98
Finland 24920 0.98
Australia 20060 0.95
Israel 17090 0.91
Spain 15320 0.88
New Zealand 13680 0.85
Cyprus 11720 0.79
Greece 11290 0.77
Portugal 10940 0.76
Korea, Rep. 9800 0.72
Argentina 7470 0.62
Hungary 4670 0.40
Venezuela 4100 0.26
Estonia 4070 0.25
Panama 3740 0.18
Mauritius 3690 0.18
Brazil 3590 0.16
Turkey 2980 0.08
Bolivia 1000 0.01
Cote d’Ivoire 650 0.01
Senegal 450 0.00
Burundi 110 0.00

Figure 4.7 shows the corresponding dialog for the fuzzy calibration in the graphical user interface, using the raw values from the Lipset dataset LR. It is the same dialog, but displaying only the controls specific to fuzzy calibration. This demonstrates the logical sequence relations between various arguments of the calibrate() function, those specific to the crisp version having no effect when calibrating to fuzzy sets, therefore taken out of the picture. Previously, the thresholds setter was specific to the crisp calibration only, but starting with version 2.5 this reactive area is displayed for fuzzy calibration as well.

In this respect, the graphical user interface has an advantage over the command line interface, because certain clicks already trigger the logical sequence of argument relations, saving the user from constantly checking those logical relations. To add more to the advantages, clicks and their logical implications are immediately translated to the command constructor (the user interface is a reactive shiny app, after all), and users learn about these implications by studying how the perfect logical command looks like, with each click.

The fuzzy "Calibrate" dialog

Figure 4.7: The fuzzy “Calibrate” dialog

In the fuzzy version of the calibrate dialog, it can already be seen that the s-shaped type of function is predefined (default), increasing from left to right. Choosing the decreasing in the radio button, will automatically change the labels of the thresholds and their corresponding command, when all values are provided. Also, the logistic function is selected by default (a second link in the logical sequence of fuzzy calibration), and with it the text box specifying the degree of membership (which depends on both fuzzy and logistic being checked, the third link in the logical sequence of argument relations).

The selector for the number of thresholds was removed from the dialog for the fuzzy calibration, because unlike the crisp version where the user can potentially specify any number of thresholds, in the fuzzy calibration this number is restricted to either 3 (for the s-shaped functions) or 6 (for the bell-shaped function) and the user is not given any possibility to make a mistake.

Only when all thresholds values have been specified (they can be copied from the crisp version, if first using the thresholds setter to determine their values), the command constructor will introduce the thresholds argument in the written command, with the associated labels “e” for exclusion, “c” or crossover and “i” for inclusion.

The last checkbox in the list of dialog options is ecdf, which stands for the empirical cumulative distribution function. This is different possibility to generate degrees of membership in particular sets, whenever the researcher does not have a clear indication that the logistic function has the best possible fit on the original raw values.

There are many other types of functions that can be employed besides the logistic function (Thiem and Dușa 2013; Thiem 2014), which are all special types of CDFs (cumulative distribution functions) that will be discussed later, but sometimes researchers don’t have a clear preference for a certain type of fit function and prefer to let the raw data speak for themselves.

This is the case with the ECDF (empirical CDF) which uses the data itself to incrementally derive a cumulative function in a stepwise procedure. In the written command, although arguments logistic and ecdf are mutually exclusive, it is important to explicitly set logistic = FALSE before making use of ecdf = TRUE (otherwise the default activated logistic function will have precedence).

ech <- calibrate(height, thresholds = "e=155, c=175, i=195",
                 logistic = FALSE, ecdf = TRUE)

Another reason why this setup is important, as it will be shown, is the case where both logistic and ecdf are deactivated, when a different set of calibrating functions will enter into operation. In order to minimize the number of arguments and reduce complexity, every last drop of logical relations between arguments has been employed in this function, but it becomes all too important to understand these relations.

The object ech contains the calibrated values using the ECDF - empirical cumulative distribution function, which are bounded in the interval between 0 and 1 just like all the other CDFs, with the difference that all values below the exclusion threshold are automatically allocated to a set membership score of 0, and all values above the inclusion threshold are allocated to a set membership score of 1 (unlike the logistic function, where by default “full membership” is considered to be any membership score above 0.95).

The following command produces the figure 4.8.

plot(height, ech, xlab = "Raw data", ylab = "Calibrated data", cex = 0.8)

The points do not follow a very clear mathematical function line, as is the case with the perfect shape of the logistical function in figure 4.6, but it is remarkably close if taking into account that the “shape” of the distribution (if it can be called like that) has been determined from the distribution of the raw, observed data.

Calibration in the set of tall people, using the ECDF

Figure 4.8: Calibration in the set of tall people, using the ECDF

In the graphical user interface, this seemingly complex relation between arguments is relaxed by the automatic activation or deactivation of the related arguments, once the starting arguments have been activated. In figure 4.7, if the user clicks on the ecdf checkbox, the logistic counterpart is automatically deactivated and the command constructor is displays the written command above. Even more, since the degree of membership text box only makes sense in relation with the logistic checkbox, when the ecdf is activated the text box will disappear from the dialog. This way, the user can visually understand which argument is related to which other, in a logical sequence.

Where both logistic and ecdf are set to FALSE, neither one of these two CDFs are used and the calibrate() function employes the following mathematical transformation (adapted after Thiem and Dușa 2013, 55):

\[\begin{equation} dm_{x} = \begin{cases} 0 & \text{if }x \leq e,\\ \frac{1}{2} \left( \frac{ e \text{ } - \text{ } x }{ e \text{ } - \text{ } c } \right)^b & \text{if } e < x \leq c,\\ 1 - \frac{1}{2} \left (\frac{ i \text{ } - \text{ } x }{ i \text{ } - \text{ } c } \right)^a & \text{if }c < x \leq i,\\ 1 & \text{if }x > i. \end{cases} \tag{4.1} \end{equation}\]


  • \(e\) is the threshold for full exclusion
  • \(c\) is the crossover
  • \(i\) is the threshold for full inclusion
  • \(x\) is the raw value to be calibrated
  • \(b\) determines the shape below the crossover (equivalent to argument below)
  • \(a\) determines the shape above the crossover (equivalent to argument above)

If a raw value is smaller than the threshold for full exclusion, it will be assigned a degree of membership equal to 0, and the same happens at the other end being attributed a value of 1 if greater than the threshold for full inclusion. The interesting part happens in the two areas between the thresholds, and the shape of the function is determined by the values of \(a\) and \(b\):

  • if left to their default values equal to 1, the function will be a perfect increasing line
  • when positive but smaller than 1, it dilates the distribution into a concave shape (\(a\) above and \(b\) below the crossover)
  • when greater than 1, it concentrates the distribution into a convex shape (same, \(a\) above and \(b\) below the crossover)

The need for such alternative functions appeared for methodological reasons (CDFs cannot produce, for example, bell-shaped curves), and it must be said that researchers commonly misunderstand how the logistic function operates (the default, and the only calibration function in the fs/QCA software). When performing calibration to crisp sets, one expects and indeed it is happening that everything to the left of the threshold to be allocated one value and everything to the right of the threshold, a different value.

The “threshold setter” acts as a first visual, mind conditioning tool that in my opinion affects the expectations on how the fuzzy calibration should work. One such natural expectation is, when establishing a threshold for full set exclusion, everything below that threshold should be fully excluded from the set (thus being allocated a value of 0) and when establishing a threshold for full set inclusion, everything above that would be fully included in the set (thus being allocated a value of 1).

Contrary to this expectation, Ragin’s procedure considers “fully out” everything with a membership value up to 0.05 and “fully in” everything with a membership value of at least 0.95. Thus, by using the logistic function, it is common to have calibrated values that never reach the maximum inclusion of 1, despite the existence of raw values above the inclusion threshold. That is a bit counter-intuitive, but it makes sense when following Ragin’s logic.

The family of functions from equation (4.1) (with respect to the choice of values for parameters \(a\) and \(b\)) makes sure that everything outside the two full exclusion and inclusion thresholds are going to be calibrated accordingly. This sort of reasoning holds true if the researcher expects a linear shape, that is all calibrated values should be lined up against a straight line, with inflexions at the outer thresholds.

Figure 4.9 shows three possible calibration functions, all using the same set of exclusion threshold e = 155, a crossover c = 175 and a full inclusion threshold i = 195.

c1h <- calibrate(height, thresholds = "e=155, c=175, i=195",
                 logistic = FALSE) # by default below = 1 and above = 1
c2h <- calibrate(height, thresholds = "e=155, c=175, i=195",
                 logistic = FALSE, below = 2, above = 2)
c3h <- calibrate(height, thresholds = "e=155, c=175, i=195",
                 logistic = FALSE, below = 3.5, above = 3.5)

plot(height, c3h, cex = 0.6, col = "gray80", main = "",
     xlab = "Raw data", ylab = "Calibrated data")
points(height, c2h, cex = 0.6, col = "gray50")
points(height, c1h, cex = 0.6)
Three alternative calibration functions

Figure 4.9: Three alternative calibration functions

The distribution with the black colour has both arguments below and above at their default values of 1 to produce a linear shape. Changing the values of the two arguments, first at the value of 2 (gray colour in the plot) and next to a value of 3.5 (the light gray colour in the plot) curves the distribution progressively, and at the value of 3.5 it is remarkably similar with the logistic shape (the only difference being that raw values outside the exterior thresholds have been allocated membership scores of exactly 0 and respectively 1).

For the decreasing type of functions (fully including the low values in the raw data, and excluding the large ones), the order of the calculations simply reverses, as shown in equation (4.2):

\[\begin{equation} dm_{x} = \begin{cases} 1 & \text{if }x \leq i,\\ 1 - \frac{1}{2} \left (\frac{ i \text{ } - \text{ } x }{ i \text{ } - \text{ } c } \right)^a & \text{if }i < x \leq c,\\ \frac{1}{2} \left( \frac{ e \text{ } - \text{ } x }{ e \text{ } - \text{ } c } \right)^b & \text{if } c < x \leq e,\\ 0 & \text{if }x > e. \end{cases} \tag{4.2} \end{equation}\]

When both parameters \(a\) and \(b\) are equal, a “decreasing” type of function is the same thing as the negation of an increasing function, and the scores from equation (4.2) can also be obtained by negating (subtracting from 1) the scored from equation (4.1).

Obtaining a linear calibrated condition using the graphical user interface is a matter of deactivating both the logistic and ecdf checkboxes. As the argument idm and its text box depend on the activated logistic checkbox, it has been removed from the dialog as shown in figure 4.10. Two new controls have appeared instead, that control the shape of the calibration function: above and below (the equivalents of the \(a\) and \(b\) parameters from (4.2).

They control whether the shape between the thresholds is linear (when they are equal to 1) or gain a certain degree of curvature, either above of below the crossover threshold. When below has a value between 0 and 1, the curve dilates in a concave shape below the crossover, and when it has values above 1 it concentrates in a convex shape below the crossover. The same happens with the other control, only above the crossover.

Mathematically, both can have negative values, but the results are unexpected and the shape is meaningless, therefore in package QCA they have been restricted to positive values.

Linear fuzzy calibration dialog

Figure 4.10: Linear fuzzy calibration dialog

In theory, the choice of calibration functions should not be a matter of personal taste or aestethics, and Thiem (2014) speculates that it might affect the coverage of the calibrated causal condition on the outcome. However, in practice there is no empirical evidence that calibration functions dramatically alter the final minimization results in QCA, using the same set of calibration thresholds. If the final solutions are identical, irrespective of the calibration functions, the default logistic one should be the simplest to use for most QCA applications.

4.2.3 How does it works: the logistic function

Ragin’s idea to use the logistic function for calculating the degree of membership is a very clever one. He adapted a mathematical transformation to the set theory, starting with the observation that both probability (specific to mathematics and statistics) and the degree of membership (specific to set theory) are ranging in the bounded interval from 0 to 1.

Probability and set membership are very different things, as Ragin (2008b, 88) rightly points out. To demonstrate how this adaptation works, I am going to start from the definition of a simple mathematical transformation of the probability, called “odds”. In the binomial distributions there are two simple concepts which contribute to the calculation of the odds, namely the probability of success (p) and the probability of failure (q), where the following equation happens: p + q = 1, and it’s counterpart is also intuitive: p = 1 - q.

In plain language, the probability of success p is the complement of (1 minus) the probability of failure q, and vice-versa. The odds are simply the ratio between the two probabilities:

\[\frac{p}{q} = \frac{p}{1 - p}\]

One particularity of the odds is the fact they are non-negative, ranging from a minimum value of 0, which happens when p = 0, to a maximum value of plus infinity which happens when p = 1. To allow the full range from minus to plus infinity, statistics uses the so called “log odds”, which is the natural logarithm of the odds6:

\[\ln{\Big(\frac{p}{1 - p}\Big)}\]

When p = 0 the log odds is equal to \(-\infty\); when p = 1 the log odds is equal to \(+\infty\), and when p = q = 0.5 the log odds will be equal to zero. This observation will play an important role in the process of calibration to set membership scores.

Ragin adapted the log odds to set theory by replacing the probability of success with the degree of membership (argument idm in function calibrate), which is also ranging from 0 to 1 so that the log odds is now equal to:

\[\ln{\Big(\frac{dm}{1 - dm}\Big)}\]

Similar to the probability model, there is a mathematical “direct” relation between any degree of membership in a set and its associated log odds (likely the reason why Ragin called this transformation the “direct method”). Whenever the degree of membership idm is known, their associated log odds can be calculated by a direct transformation, but most importantly knowing the value of the log odds allows calculating the degree of membership, which is the purpose of the calibration process.

It all boils down to calculate the equivalent of the log odds for any particular raw value, and the degree of membership can be mathematically calculated. This is precisely what the fuzzy calibration does, using the logistic function. In his example with the per capita income, Ragin employs the following procedure:

  1. establish the thresholds: 20000 for full inclusion, 5000 for crossover and 2500 for the full exclusion from the set of developed countries
  2. calculate the deviation of each raw value from the crossover, where values above that threshold will be positive, and values below will be negative.
  3. calculate the ratio between the log odds associated with full membership (3) and the distance between the full inclusion and crossover thresholds (15000), for the positive deviations from the crossover: 3/15000 = 0.0002
  4. calculate the ratio between the log odds associated with full exclusion (-3) and the distance between the full exclusion and crossover thresholds (-2500), for the negative deviations from the crossover: -3/-2500 = 0.0012
  5. calculate the equivalent log odds for each raw value of per capita income, multiplying each deviation from the crossover with the scalar resulting from step 3 or step 4, depending if the deviation is positive of negative
  6. mathematically derive the degree of membership out of the calculated log odds.

For example the case of Israel, which has a per capita income of 17090 USD and a deviation from the crossover equal to 12090 USD. This deviation should be multiplied with 0.0002 to obtain the associated log odds of 2.42, and the degree of membership can be mathematically derived, cancelling out the effect of the logarithm by using the exponential function:

\[\ln{\Big(\frac{dm}{1 - dm}\Big)} = 2.42\] that is equivalent to: \[\frac{dm}{1 - dm} = e^{2.42}\] and finally the degree of membership is trivially extracted: \[dm = \frac{e^{2.42}}{1 + e^{2.42}} \approx 0.92\]

The value of 0.92 (in fact 0.9183397 rounded to two decimals) is what the calculation arrives at, and is exactly what the fs/QCA software outputs for Israel. Using some mathematical tricks specific to the logistic function combining the natural logarithm and the exponential function, the same value can be obtained in a single step calculation:

\[dm = \frac{1}{1 + e^{-\frac{12090 \times 3}{15000}}} = \frac{1}{1 + e^{-2.42}} \approx 0.92\]

However, there is a very slight approximation introduced by Ragin (2008b, 90) in his calculations, when refering to:

“… cases with per capita income of $20,000 or greater (i.e. deviation scores of $15,000 or greater) are considered fully in the target set, with set membership scores >= 0.95 and log odds of membership >= 3.0”

Actually, for a set membership score of 0.95 the exact value of the log odds is 2.944439, or the other way round for a log odds value of 3 the exact set membership score is 0.9525741. However there is no direct, mathematical relation between the set membership score of 0.95 and the log odds value of 3. Keeping 0.95 as the fixed set membership score for full set inclusion, the calculation becomes:

\[dm = \frac{1}{1 + e^{- \frac{12090 \times 2.944439}{15000}}} = \frac{1}{1 + e^{-2.373218}} \approx 0.91\]

The function calibrate() in package QCA offers the argument idm (inclusion degree of membership) which has a default value of 0.95, thus deriving the exact value of the log odds for each particular raw value. In the vast majority of situations, the results between fs/QCA and the R package QCA are identical but in very few situations, due to the difference between approximate versus exact values, there are very slight differences, for example Israel where the exact calculated value is 0.9147621, and rounded to two decimals becomes 0.91 as opposed to the value of 0.92 presented by Ragin. These kinds of differences, however, are insignificant with respect to the overall, calibrated set membership scores.

4.2.4 Direct method, the “bell-shape” functions

A cumulative distribution function is not the universal answer for all calibration questions. There are situations when the problem is less about which calibration function to choose, but more with the fact that a CDF like the logistic function is simply unable to perform calibration for certain types of concepts.

All previous examples refer to calibration functions which always end in the opposite diagonal corner of where they started. If starting from the lower left part of the plot, it is an increasing function (therefore excluding the low values and including the large ones from the raw data distribution), and if starting from the upper left corner it is a decreasing function (including low values and excluding the large ones from the set). These types of functions are called monotonic, always increasing or always decreasing to a certain point.

But there are situations where the concept to be calibrated does not refer to the extreme points of the raw data distribution. Taking Ragin’s example of the “developed” countries, another possible concept could be for example “moderately developed” countries. In this definition, Burundi (110 USD) is clearly out of the set, but interestingly Switzerland (40110 USD) should also be out of this set because it is a highly developed country, not a “moderately developed” one.

Both extremes of very poor and very rich countries should be excluded from the set of moderately developed ones. So the calibration function should only include the countries in the middle of the data distribution (the mid points), excluding the extremes (the end points). Clearly, this sort of calibration cannot be performed by a monotonic function which is built to include points only at the extremes. Rather, it should be a non-monotonic function which can change signs in the middle of the distributions and decrease if previously was increasing, or increase if previously was decreasing.

This is the very description of a “bell-shaped” function curve, as depicted in figure 4.11, where part a. displays two types of linear calibrations (trapezoidal with gray colour and triangular with black) that both resemble a “bell”-like shape, while part b. displays a their “curved” counterparts using the same sets of thresholds but with different values for the parameters above and below.

Plotting the objects produced with the code below, against the initial height which contains the initial raw values, will produce figure 4.11, using a plot matrix of 1 row and 2 columns, as in the similar looking figure 4.6 (only the objects are created with these commands, to actually produce the plots readers should use the plot() and points() functions).

triang <- calibrate(height, thresholds = "e1=155, c1=165, i1=175, i2=175,
                    c2=185, e2=195")
trapez <- calibrate(height, thresholds = "e1=155, c1=164, i1=173, i2=177,
                    c2=186, e2=195")
bellsh <- calibrate(height, thresholds = "e1=155, c1=165, i1=175, i2=175,
                    c2=185, e2=195", below = 3, above = 3)
trabel <- calibrate(height, thresholds = "e1=155, c1=164, i1=173, i2=177,
                    c2=186, e2=195", below = 3, above = 3)
Fuzzy calibrations in the set of "average height"

Figure 4.11: Fuzzy calibrations in the set of “average height”

Unlike the s-shaped functions which need three thresholds to define the specific shape, the bell-shaped functions need six. Since the inclusion happens at the middle of distribution, there are two areas that are excluded, hence two exclusion points and two associated crossovers. Finally, the two inclusion points define the exact range which is going to be included. In the code above, it can be noticed the inclusion points for the triangular shape are equal, which explains why the shape has a single point in the top middle area.

Based on the (non)equality of the inclusion thresholds, there are only two main types of bell-shape functions: triangular, with a single point at the top, and trapezoidal, with a range of points at the top. The other shapes are all part of this family, with different degrees of curvature based on the values of above and below arguments.

What is shared by both s-shaped and bell-shaped families of functions, and it makes the task of defining and identifying the set of thresholds easier, is the fact their thresholds are specified in ascending order, from left to right. Whether increasing or decreasing, it does not matter: the thresholds are always specified in ascending order, in the same logic as a plot of points which is always drawn from left to right on the x axis, in ascending order.

The key to identify the difference between an increasing and decreasing function lies in the names of the thresholds: if beginning with an exclusion it means an increasing function (first excluding the smaller values on the left, then gradually increasing towards the larger values on the right), and if beginning with an inclusion it means a decreasing function.

This is valid for both s-shapes and bell-shapes alike. In the examples above, thresholds are all starting from the left with an e to exclude the points on the left, then increasing towards, and include the middle points, then decreasing to again exclude the larger values on the right.

The mathematical transformations that define these families of curves are similar to the s-shape functions, combining increasing and decreasing equations.

\[\begin{equation} dm_{x} = \begin{cases} 0 & \text{if }x \leq e_{1},\\ \frac{1}{2} \left( \frac{ e_{1} \text{ } - \text{ } x }{ e_{1} \text{ } - \text{ } c_{1} } \right)^b & \text{if } e_{1} < x \leq c_{1},\\ 1 - \frac{1}{2} \left (\frac{ i_{1} \text{ } - \text{ } x }{ i_{1} \text{ } - \text{ } c_{1} } \right)^a & \text{if }c_{1} < x \leq i_{1},\\ 1 & \text{if }i_{1} < x \leq i_{2},\\ 1 - \frac{1}{2} \left (\frac{ i_{2} \text{ } - \text{ } x }{ i_{2} \text{ } - \text{ } c_{2} } \right)^a & \text{if }i_{2} < x \leq c_{2},\\ \frac{1}{2} \left( \frac{ e_{2} \text{ } - \text{ } x }{ e_{2} \text{ } - \text{ } c_{2} } \right)^b & \text{if } c_{2} < x \leq e_{2},\\ 0 & \text{if }x > e_{2}. \end{cases} \tag{4.3} \end{equation}\]

It is easy to see the equation (4.3), specific to the increasing bell-shape function, is not much different from equations (4.1) and (4.2).

The equation for the decreasing bell-shape simply reverses the order of transformations for the increasing bell-shape function, but as already mentioned in the s-shape section, when above is equal to below, a decreasing function is equivalent to the negation of the increasing one, both in terms of logics (“short” really means “not tall”), and also mathematically because a decreasing calibration is the same as negating the degrees of membership for the increasing function, by subtracting the membership scores from 1. The decreasing bell-shape function, in this example, means “not average height” which includes both short and tall people and excludes the middle heights.

If parameters \(a\) and \(b\) are not equal, then a simple negation is not possible because the shape of the curve below and above the crossover differs. In this situation, and generally valid for all situations, the best approach is to apply the mathematical transformation from equation (4.4).

\[\begin{equation} dm_{x} = \begin{cases} 0 & \text{if }x \leq i_{1},\\ 1 - \frac{1}{2} \left (\frac{ i_{1} \text{ } - \text{ } x }{ i_{1} \text{ } - \text{ } c_{1} } \right)^a & \text{if }i_{1} < x \leq c_{1},\\ \frac{1}{2} \left( \frac{ e_{1} \text{ } - \text{ } x }{ e_{1} \text{ } - \text{ } c_{1} } \right)^b & \text{if } c_{1} < x \leq e_{1},\\ 1 & \text{if }e_{1} < x \leq e_{2},\\ \frac{1}{2} \left( \frac{ e_{2} \text{ } - \text{ } x }{ e_{2} \text{ } - \text{ } c_{2} } \right)^b & \text{if } e_{2} < x \leq c_{2},\\ 1 - \frac{1}{2} \left (\frac{ i_{2} \text{ } - \text{ } x }{ i_{2} \text{ } - \text{ } c_{2} } \right)^a & \text{if }c_{2} < x \leq i_{2},\\ 0 & \text{if }x > i_{2}. \end{cases} \tag{4.4} \end{equation}\]

The corresponding graphical user interface dialog, as seen in figure 4.12, is similar to the previous one but the radio button is now switched to bell-shaped instead of s-shaped. The user interface automatically creates the six text boxes where the values of the thresholds are inserted (in ascending order from left to right).

Calibration dialog for the fuzzy set of "average development"

Figure 4.12: Calibration dialog for the fuzzy set of “average development”

For each value, one threshold is created in the thresholds setter area, and function of their positions the points display the familiar triangular shape, given the two inclusion thresholds in the middle are equal. As in the case of crisp calibration, thresholds can be dragged left and right (or modified directly in the text boxes), and the vertical position of the points (their associated degree of membership) is recalculated every time the action is performed.

Once generated, the thresholds are not allowed to move past other thresholds, in order to maintain a strict order of their associated values, in ascending order. It should also be mentioned that the thresholds setter area has an informative role only, the dataset doesn’t get modified until the user presses the “Run” button.

This makes the thresholds setter highly interactive, and more intuitive for users who, up until now, were calibrating “blind”: the result was visible only in the database, but now it is possible to visualize the end result before creating a new calibrated condition.

4.2.5 The indirect method

The indirect method of calibration, although attributing fuzzy set scores in the target condition, has very little to do with the fuzzy calibration from the previous sections. It has no anchors, and it doesn’t need to specify a calibration function to calculate the scores along with.

It implies a rather simple procedure involving two steps: a first one of data recoding, which is trivially done using the function recode(), and a second one of a numerical transformation involving both the original, interval level condition and the recoding from the first step.

In the data recoding step, the task is to first establish the number of categories to recode to. Ragin (2008b) mentions six such qualitative categories, as seen in table 4.2, but this number can be flexible allowing more or sometimes even less categories.

Table 4.2: Indirect calibration categories.
Completely in the set 1.0
Mostly but not completely in the set 0.8
More in than out of the set 0.6
More out than in the set 0.4
Mostly but not completely out of the set 0.2
Completely out of the set 0.0

Using Ragin’s per capita income inc object, the actual recoding is performed using:

incr <- recode(inc, cuts = "1000, 4000, 5000, 10000, 20000",
               values = seq(0, 1, by = 0.2))

The choice of recoding intervals are arbitrary and used here only to match Ragin’s qualitative codings, with the same end result.

The next step is to have the computer predict the qualitative codings, based on the researcher’s own codings. One of the statistical techniques which can be used for prediction purposes is the regression analysis: having calculated the intercept and the regression coefficients, for any given value of X (the independent variable) it is possible to predict the value of Y (the dependent variable). In this example inc is the independent variable, while the recoded object incr will play the role of the dependent variable.

There are many types of regression analyses, depending mostly on the structure of the dependent variable Y, but also on the particular distribution of points in the scatterplot. For example, if Y is a numerical (interval level) variable and the scatterplot is linear, then a linear regression would suffice.

But in our example the dependent variable is hardly a genuine interval-level variable, at least because it only has 6 unique values (not enough variation) but also, and perhaps most importantly, because it ranges from 0 to 1, which would suggest something closer to a logistic regression. That is also not possible because the dependent variable has to have only two binary values 0 and 1 (round numbers), therefore the most appropriate method to use is a so called fractional polynomial function.

There is a series of papers discussing these kinds of models (Royston and Altman 1994; Sauerbrei and Royston 1999), demonstrating very good fitting capabilities, which is especially useful for this example. For a single independent variable (covariate), the fractional polynomial of degree \(m > 0\) for an argument \(X > 0\) with powers \(p_{1} < \dots < p_{m}\) is defined as the function:

\[\begin{equation} \phi_{m}(X; \beta, p) = \beta_{0} + \sum_{j = 1}^{m} \beta_{j}X^{(p_{j})} \tag{4.5} \end{equation}\]

In a regular polynomial function, \(X\) is additively raised to the vector of powers \(p\) until reaching the degree \(m\). In the case of fractional polynomials, there are two differences:

  1. the vector of powers is not restricted to whole numbers, but it is rather taken from a small predefined vector {-2, -1, -0.5, 0, 0.5, 1, 2, 3}.
  2. the round brackets notation \((p_{j})\) signals a Box-Tidwell transformation when \(p = 0\), using the natural logarithm \(\ln(X)\) instead of \(X^{0}\).

In R, there are many ways to calculate fractional polynomials, most notably using the package mfp designed for multiple covariates. For a single covariate there is a simpler alternative in the glm() function that can be used to access the binomial logit family.

Given that all raw values of income per capita are positive, we are only interested in the subset of the special vector {0, 0.5, 1, 2, 3}. But Royston and Altman also demonstrated that a fractional polynomial of degree 2 is more than enough to fit the vast majority of data, therefore we are in fact interested in the subset of the vector of powers {0, 0.5, 1, 2}.

The \(\beta_{j}\) are the regression parameters, so that a fractional polynomial of second degree with powers \(p =\) {0, 0.5, 1, 2} has the very similar, regression like form:

\(\beta_{0} + \beta_{1}\ln(X) + \beta_{2}X^{0.5} + \beta_{3}X + \beta_{4}X^{2}\)

This equation is specified in the fracpol model:

fracpol <- glm(incr ~ log(inc) + I(inc^(1/2)) + I(inc^1) + I(inc^2),
               family = quasibinomial(logit))

This command executes a (quasi)binomial logistic regression with a fractional polynomial equation, calculating the intercept and the four regression coefficients, which can be inspected using summary(fracpol).

The rest is a simple calculation of the predicted values, based on the glm model, using the command predict(fracpol, type = "response"). All this procedure is already encapsulated in the function calibrate():

cinc <- calibrate(inc, method = "indirect",
                  thresholds = "1000, 4000, 5000, 10000, 20000")
round(cinc, 3)
 [1] 1.000 1.000 0.999 0.999 0.963 0.886 0.829 0.782 0.741 0.734 0.728
[12] 0.709 0.649 0.417 0.328 0.323 0.267 0.259 0.242 0.142 0.002 0.001
[23] 0.000 0.000

Here, the first threshold is set to 999 instead of 1000, to make a decision about Bolivia which has a GDP of exactly 1000. The resulting values are not an exact match with Ragin’s indirect calibrated ones, but that is understandable since Ragin fitted his model using all 136 countries while here we only used a subset of 24, and the implementation of the fracpoly function in Stata might be different from this attempt.

However these values, even for a small subset of 24 countries, are remarkably similar to the direct calibrated values and seem to solve some problems, for example faithfully implementing $20000 as the threshold for full membership in the set of developed countries (0.95), correctly separating the 5th country Australia (0.963) from the 6th Israel (0.886).

4.3 Calibrating categorical data

When introducing the concept of calibration in QCA, Ragin (2008a, 2008b) writes solely about “transforming interval-scale variables into fuzzy sets”, and the different methods to obtain the equivalent fuzzy set scores.

While Ragin offers plenty of examples but not a formal definition of the calibration process, Schneider & Wagemann (2012, 23) are more general and define calibration as the process of how:

“… set membership scores are derived from empirical and conceptual knowledge.”

This definition is so large, that it can incorporate basically anything, because empirical and conceptual knowledge is not limited strictly to interval level data, there are all sorts of knowledge being accumulated from both quantitative, and especially from the qualitative research strategies (after all, the “Q” from QCA comes from qualitative, not quantitative comparative analysis).

As we have seen, social sciences present two main classes of data: categorical (composed of nominal and ordinal variables), and numeric (interval and ratio level of measurement). Calibrating numerical data was covered extensively in the previous sections, but there is another trend to transform categorical data into fuzzy sets, and this is often called calibration as well.

The final outcome of the calibration process is a vector of fuzzy, numeric scores. It is important to underline the focus the attention on the word “numeric”, because fuzzy scores are proper numbers between 0 and 1.

With respect to nominal variables (pure, unordered categorical data), they can only be calibrated to crisp sets. It would be impossible to derive continuous, fuzzy numeric scores for categories such as “urban” and “rural”. These types of data can be transformed (“calibrated”) into either binary crisp sets, if there are only two categories, or multi-value crisp sets if there are multiple categories belonging to a certain causal condition.

Binary crisp sets are the most general type of calibrated data, and virtually any kind of raw data (qualitative and quantitative alike) can be transformed to this type of crisp sets. Whether nominal, ordinal, interval or ratio, all of them can be coerced to 1 and 0 values, characteristic for binary crisp sets.

Interval and ratio type of data have already been discussed: with proper care when choosing the inclusion thresholds, numeric raw data can be transformed into either crisp, or fuzzy data using the direct or the indirect methods of calibration.

The other level of measurement which is still open to discussion, is the ordinal level of measurement for the raw data. Ordinal variables are categorical, with a further restriction that categories must be arranged in a certain order.

Some types of ordinal variables, especially those with a very limited number of values, are also very easy to calibrate. A variable called “traffic lights” having three categories “red”, “yellow” and “green” (in that particular order, or reversed, but yellow is always in the middle) is still a very clear categorical variable which can only be calibrated to a multi-value crisp set having three values: 0, 1 and 2, where 0 could mean a complete stop of vehicle movement (“red”), 1 could mean to prepare to stop and slow down speed (“yellow”) and 2 could mean move ahead freely (“green”).

It would not make any sense, and would not serve any practical purpose to transform this type of raw ordinal variable into a pure fuzzy set having values 0, 0.5 and 1 (for many reasons, including the fact that calibration should always avoid the value of 0.5, with more details in the next chapter.)

Values 0, 1 and 2 are just as good, and one can find all sorts of necessity and sufficiency claims with respect to one of these values and a given outcome, for example producing and accident or not.

The only type of ordinal data which can potentially be confusing are the Likert-type response scales. These are also categorical, but many researchers seem to have little reservations to calculate central tendency measures such as the mean or standard deviation, typically used for numeric data only.

Although categorical, response values from the Likert type scales are often treated as if they were interval numbers. While I believe this is a mistake, it is a completely different topic than calibration. Especially for a low number of values (there are Likert type response scales with as little as 4 categories), treating the values as numbers is difficult in general, and it represents an even bigger difficulty for calibration purposes.

But there are two other, major reasons for which calibrating Likert scales is difficult. The first one is related to the bipolar nature of the Likert response scales. While fuzzy sets are unipolar, for example satisfaction, in the case of Likert response scales they are usually constructed from a negative end (1. Very unsatisfied) to a positive end (5. Very satisfied).

A mechanical transformation of a bipolar scale into a uni-dimensional set is likely to introduce serious conceptual questions, as there is no logical reason for which the end “very unsatisfied” should be treated as the zero point in the set of “satisfaction”. Of course, a very unsatisfied person is certainly outside the set of satisfied people, but so can be argued about the mid point (3. Neither, nor) on a 5 values Likert scales, where only the last two values refer to satisfaction: 4. Satisfied and 5. Very satisfied.

In set theory, satisfaction and dissatisfaction can be treated as two separate sets, rather than two ends of the same set. A person can be both satisfied and unsatisfied in the same time, despite the fact that a single response is given for a Likert type scale.

Exploring the robustness of QCA result applied to large-N data, Emmenegger, Schraff, and Walter (2014) used the European Social Survey data wave 2002/03 to study the attitudes towards immigration in Germany. They have proposed a method to transform a 5 points Likert type response scale into fuzzy set scores, using exactly this technique to consider values 1, 2 and 3 more out of the set, and values 4 and 5 more in the set (in between defining a region of indifference, as a qualitative anchor point), thus exposing the analysis to the arguments above.

A second, perhaps even more important reason for which Likert type data is difficult to be calibrated as if they were interval, is the fact that many times responses are skewed towards one of the ends. Either in situations where respondents avoid extremes (thus concentrating responses around the mid point), or in situations where people avoid the negative end (thus concentrating responses towards to positive half of the scale), it is possible to find that most responses are clustered around a certain area of the scale. Rarely, if ever, are responses uniformly distributed across all response values.

When responses are mostly clustered (skewed) around 3 values instead of five, they can easily be calibrated to a multi-value crisp set, but even with 5 evenly distributed values, it is still possible to construct a such a multi-value crisp set. More challenging are Likert type response scales with more than 5 values, most often from 7 values and up, because at this number it gets increasingly difficult for respondents to think in terms of categories, the distance between values becoming apparently equal.

Skewness is a serious issue that needs to be addressed, even for a 7 values Likert response scale. Theoretical knowledge does not help very much, first because such response scales are highly dependent on the specification of the extremes ends (for example, something like “Extremely satisfied” is surely different from “Very satisfied”), and second because there no real guarantee that a particular choice of wording has the same meaning in different cultures.

In any situation, transforming ordinal data into numeric fuzzy scores is highly problematic, and enters a challenging territory even for the quantitative research. To claim that qualitative research can do a better job than its quantitative counterpart, in creating numeric scores is questionable to the very least.

That being said, in a situation where the Likert response scale is large enough (at least 7 points), and the responses are more or less evenly distributed across all values, there might be a straightforward method to obtain fuzzy scores from these categorical response values, combining ideas from the quantitative research strategy but insufficiently explored for QCA purposes.

In a study addressing the fuzzy and relative poverty measures, Cheli and Lemmi (1995) seek to analyze poverty in a multidimensional perspective. Their work is relevant because poverty studies have to categorize respondents into poor and non-poor, which is a very similar approach to the fuzzy calibration. For this objective, they propose a method called TFR (totally fuzzy and relative) based on rank orders, thus applicable to both ordinal and interval levels of measurement.

The TFR technique uses an empirical cumulative distribution function on the observed data, and it is best suited to interval level data (a situation already covered by the function calibrate(), activating the argument ecdf = TRUE). However, when data is categorical (even skewed), they propose a normalized version by applying a simple transformation to create a membership function that outputs scores between 0 and 1.

The formula below is an adaptation of their function, restricted to values equal to 0 or above, to make sure it can never output negative values (a safety measure also employed by Verkuilen 2005):

\[ TFR = max\left(0, \frac{E(x) - E(1)}{1 - E(1)}\right) \]

E() is the empirical cumulative distribution function of the observed data, and the formula basically calculates the distance from each CDF value to the CDF of the first value 1 in the Likert response scale, and divide that to the distance between 1 (the maximum possible fuzzy score) to the same CDF of the first value 1 in the same Likert response scale.

To demonstrate this with the R code, I will first generate an artificial sample of 100 responses on a 7 points response scale, then calibrate that to fuzzy sets using the method just described.

# generate artificial data
values <- sample(1:7, 100, replace = TRUE)
E <- ecdf(values)
TFR <- pmax(0, (E(values) - E(1)) / (1 - E(1)))

# the same values can be obtained via the embedded method:
TFR <- calibrate(values, method = "TFR")

The object TFR contains the fuzzy values obtained via this transformation:

table(round(TFR, 3))

    0 0.193 0.398  0.58 0.682 0.807     1 
   12    17    18    16     9    11    17 

The fuzzy values resulted from this transformation are not mechanically spaced equally between 0 and 1, because they depend on the particular distribution of the observed data. This is very helpful, giving guaranteed suitable fuzzy scores even for highly skewed data coming from ordinal scales.

4.4 The zoom factor

Chances are the most misunderstood part of the calibration procedure is related to how the external criteria are used.

In his excellent description of the calibration process, Mello (2021) presents three such criteria: - undisputed facts - generally accepted conceptions - individual expertise


Bogin, Barry. 1998. “The Tall and the Short of It.” Discover 19 (2): 40–44.
Bolton-Smith, Caroline, Mark Woodward, Hugh Tunstall-Pedo, and Caroline Morrison. 2000. “Accuracy of the Estimated Prevalence of Obesity from Self Reported Height and Weight in an Adult Scottish Population.” Journal of Epidemiology and Community Health 54: 143–48.
Cheli, Bruno, and Achille Lemmi. 1995. “A ‘Totally’ Fuzzy and Relative Approach to the Multidimensional Analysis of Poverty.” Economic Notes 1: 115–34.
Cronqvist, Lasse, and Dirk Berg-Schlosser. 2009. Multi-Value QCA (mvQCA).” In Configurational Comparative Methods: Qualitative Comparative Analysis (QCA) and Related Techniques, edited by Benoît Rihoux and Charles Ragin, 69–86. London: Sage Publications.
Emmenegger, Patrick, Dominik Schraff, and Andre Walter. 2014. QCA, the Truth Table Analysis and Large-N Survey Data: The Benefits of Calibration and the Importance of Robustness Tests.”
———. 2006b. Social Science Concepts. A User’s Guide. Princeton; Oxford: Princeton University Press.
Lipset, Martin Seymour. 1959. “Some Social Requisites of Democracy: Economic Development and Political Legitimacy.” American Political Science Review 53 (1): 69–105.
Mello, Patrick. 2021. Qualitative Comparative Analysis. An Introduction to Research Design and Application. Washington, DC: Georgetown University Press.
Neuman, Lawrence W. 2003. Social Research Methods. Qualitative and Quantitative Approaches. 5th ed. Boston: Allyn Bacon.
Persico, Nicola, Andrew Postlewaite, and Dan Silverman. 2004. “The Effect of Adolescent Experience on Labor Market Outcomes: The Case of Height.” Journal of Political Economy 112 (5): 1019–53.
———. 2000. Fuzzy Set Social Science. Chicago; London: University of Chicago Press.
———. 2008a. “Measurement Versus Calibration: A Set Theoretic Approach.” In The Oxford Handbook of Political Methodology, edited by Janet Box-Steffensmeier, Henry E. Brady, and David Collier, 174–98. Oxford: Oxford University Press.
———. 2008b. Redesigning Social Inquiry. Fuzzy Sets and Beyond. Chicago; London: University of Chicago Press.
Rihoux, Benoît, and Gisèle De Meur. 2009. “Crisp-Set Qualitative Comparative Analysis (csQCA).” In Configurational Comparative Methods: Qualitative Comparative Analysis (QCA) and Related Techniques, edited by Benoît Rihoux and Charles Ragin, 33–68. London: Sage Publications.
Royston, Patrick, and Douglas G. Altman. 1994. “Regression Using Fractional Polynomials of Continuous Covariates: Parsimonious Parametric Modelling.” Journal of the Royal Statistical Society. Series C 43 (3): 429–67.
Sauerbrei, William, and Patrick Royston. 1999. “Building Multivariable Prognosticand Diagnostic Models: Transformation of the Predictors by Using Fractional Polynomials.” Journal of the Royal Statistical Society. Series A 162 (1): 71–94.
Schneider, Carsten, and Claudius Wagemann. 2012. Set-Theoretic Methods for the Social Sciences. A Guide to Qualitative Comparative Analysis. Cambridge: Cambridge University Press.
Smithson, Michael, and Jay Verkuilen. 2006. Fuzzy Set Theory. Applications in the Social Sciences. Thousand Oaks: Sage.
Thiem, Alrik. 2014. Membership Function Sensitivity of Descriptive Statistics in Fuzzy-Set Relations. International Journal of Social Research Methodology 17 (6): 625–42.
Thiem, Alrik, and Adrian Dușa. 2013. Qualitative Comparative Analysis with R. A User’s Guide. New York; Heidelberg; Dordrecht; London: Springer.
Verkuilen, Jay. 2005. “Assigning Membership in a Fuzzy Set Analysis.” Sociological Methods and Research 33 (4): 462–96.

  1. this term should not be confused with the log odds in the logistic regression (aka “logit”), that is the natural logarithm of the “odds ratio”.↩︎