# Chapter 4 Calibration

The usual QCA data is numeric, and has specific formats for each flavour: when crisp (either binary or multi-value) the data consists of integers starting from the value of 0, and when fuzzy the values span over a continuous range, anywhere between 0 and 1.

Not all data from scientific research conform to these formats. In fact, one can actually expect the contrary, that most of the times the raw data has a different shape than expected to perform QCA. There are numeric variables of all types and most importantly of all ranges, and there are qualitative variables that separate cases into categories etc.

Calibration is a fundamental operation in Qualitative Comparative Analysis. It is a transformational process from the raw numerical data to set membership scores, based on a certain number of qualitative anchors or thresholds. The process is far from a mechanical transformation, because the choice of the calibration thresholds is a theoretically informed one and dramatically changes the result of the calibration process.

Although this book is more about the practical details on how to perform QCA specific operations with R, it is still important to cover (at least in a brief way) the theoretical concepts underpinning these operations.

Typically, social science research data are separated into four levels of measurement: nominal, ordinal, interval and ratio. Virtually all of them can be calibrated to binary crisp sets, almost but not all can be calibrated to multi-value crisp sets, while in the case of fuzzy sets only the “interval” and “ratio” levels of measurement can be used. However, the concept of “calibration” is something different from the concept of “measurement” .

When thinking about “measurement”, the social science researcher arranges the values of a variable in a certain order (where some of the values are smaller / higher than others), or calculate their standardized score by comparing all of them with the average of the variable: some will have positive scores (with values above the average), and some will have negative scores (values below the average).

While this approach makes perfect sense from a mathematical and statistical point of view, it tells almost nothing in terms of set theory. In the classical example of temperature, values can be arranged in ascending order, and any computer will tell which of the values are higher than others. But no computer in the world would be able to tell which of the temperature values are “hot”, and which are “cold”, only by analysing those values.

Mathematics and statistics usually work with a sample, and the derived measurement values are sample specific. The interpretation of the values, however, need more information which is not found in the sample, but in the body of theory: “cold” means when the temperature approaches zero degrees Celsius (the point where the water transforms into ice) and “hot” means when the temperature approaches 100 degrees Celsius (the point where the water transforms into vapours). This kind of information is not found in the sample, but outside.

The same can be said about any other numerical variable. Considering people’s heights, for instance, computers can tell which which person has a higher height the other(s), but it could not tell what it means to be “tall” and what it means to be “short”. These are human interpreted, abstract concepts which are are qualitative in nature, not quantitative, and most importantly they are culturally dependent (a tall person in China means a very different thing than a tall person in Sweden).

In the absence of these humanly assessed qualitative anchors, it is impossible for a computer (or a researcher with no other prior information) to derive qualitative conclusions about the values in the sample.

The first section in this chapter is dedicated to crisp calibrations, showing that it is actually improper to use the term “calibration” for crisp sets, since they involve a simple recoding of the raw data instead of a seemingly complicated calibration process.

The second section is devoted to the proper fuzzy calibration, including various types of “direct” calibration and also the “indirect” calibration. Multiple methods are available for the direct method of calibration, depending of the definition of the concept to be calibrated, and package QCA is well equipped with a complete set of tools to deal with each such situation. The indirect method of calibration is very often misinterpreted with a direct assignment of fuzzy scores, but it is much more than that.

The final section is dedicated to calibrating categorical data, as an attempt to clarify some aspects regarding the meaning of the word calibration and how the final outcome of the calibration process depend heavily on the type of input data.

## 4.1 Calibrating to crisp sets

When using only one threshold, the procedure produces the so called “binary” crisp sets dividing the cases into two groups, and if using two or more thresholds the procedure produces the “multi-value” crisp sets. In all situations, the number of groups being obtained is equal to the number of thresholds plus 1.

All previously given examples (temperature, height) imply numerical variables, which means the concept of calibration is most often associated with fuzzy sets. That is a correct observation, because the process of calibration to crisp sets is essentially a process of data recoding. As the final result is “crisp”, for binary crisp sets the goal is to recode all values below a certain threshold to 0, and all values above that threshold to 1 (for multi-value crisp sets, it is possible to add new values for ranges between two consecutive thresholds).

As in all other sections and chapters, wherever there are options for both command line and graphical user interface, the demonstration starts with the command line and the user interface will follow in close synchronization.

To exemplify this type of calibration, a well known data from the Lipset (1959) study is going to be loaded into the working space via this command:

data(LR)

There are four versions of the Lipset dataset included in the QCA package: LR (the raw data), LC (calibrated to binary crisp sets), LM (calibrated to multi-value crisp sets) and LF (calibrated to fuzzy sets). The description of all columns, including the outcome, can be found via the command ?LR.

This example concentrates in the column DEV, referring to the level of development as GNP per capita, measured in US dollars (values for 1930), which spans from a minimum value of 320 and a maximum value or 1098:

sort(LR$DEV)  [1] 320 331 350 367 390 424 468 517 586 590 662 720 795 [14] 897 983 1008 1038 1098 Before determining what an appropriate threshold is, to separate these values into two groups (0 as not developed, and 1 as developed to create a binary crisp set), it is always a good idea to graphically inspect the distribution of points on a horizontal axis. The graphical user interface has an embedded threshold setter in the dialog for the calibration menu, and there are various ways to create a similar plot via command line, using for example the function plot(), but the simplest would be to use the dedicated function Xplot() that inspects just one variable, with a similar looking result as the threshold setter area in the user interface: Xplot(LR$DEV, at = pretty(LR$DEV), cex = 0.8) This particular plot has sufficiently few points that don’t overlap much, but if there are many overlapping points the function has an argument called jitter which can be activated via jitter = TRUE. In the absence of any theoretical information about the what “development” means (more exactly, what determines “high” or “low” development), one approach is to inspect the plot and determine if the points are grouped in natural occurring clusters. It is not the case with this distribution, therefore users can either resort to finding a threshold using a statistical clustering technique, or search a relevant theory. In the QCA package there is a function called findTh() which employs a cluster analysis to establish which threshold values best separates the points into a certain number of groups. To separate into two groups, as explained in section 2.3.2, no other additional parameters are needed because the number of thresholds (argument n) is by default set equal to 1. The command is: findTh(LR$DEV)
[1] 626

The value of 626 was found by a complete hierarchical clustering, using the euclidean distance (see the default values of arguments hclustm and distm). However, Rihoux and De Meur (2009) have decided to use a close but different threshold value of 550 USD for their binary crisp calibration. Initially, they have used a threshold of 600 USD but upon closer inspection during their analysis, they have found that a value of 550 USD accounts for a natural ‘gap’ in the distribution of values and better “differentiates between Finland (590 USD) and Estonia (468 USD)”.

If not for the threshold value of 550 USD, Finland and Estonia would fall in the same category of development, and there is clearly a difference between the two neighbouring countries. This is a fine example of how theory and practical experience are used as a guide to establish the best separating line(s) which define the crisp groupings of cases. This is a qualitative assessment originating from outside the dataset: it is not something derived from the raw data values, but from external sources of knowledge, and makes a perfect example of the difference between “calibration” and raw data “measurement”.

The final, calibrated values can be obtained with two different methods. The first is to use the calibrate() function and chose the type = "crisp" argument (the default is "fuzzy"):

calibrate(LR$DEV, type = "crisp", thresholds = 550)  [1] 1 1 1 0 1 1 1 0 0 1 0 1 0 0 0 0 1 1 There are other arguments in the function calibrate(), but all of those refer to the fuzzy type calibration. In the crisp version, these two arguments are the only necessary ones to obtain a calibrated condition. The crisp version of calibration is essentially equivalent to recoding the original raw data to a finite (and usually very low) number of crisp scores. Therefore a second method, which will give exactly the same results, is to use the recode() function from package QCA, using the following command: recode(LR$DEV, rules = "lo:550 = 0; else = 1")
 [1] 1 1 1 0 1 1 1 0 0 1 0 1 0 0 0 0 1 1

The syntax of the recode() function is very simple, having only two formal arguments: x and rules, where the first is the initial raw vector of data to be recoded, while the second is a string determining the recoding rules to be used. In this example, it can be translated as: all values between the lowest lo and 550 (inclusive) should be recoded to 0, and everything else should be recoded to 1.

Calibrating to multi-value crisp sets is just as simple, the only difference being the number of thresholds n that divide the cases into n + 1 groups:

findTh(LR$DEV, n = 2) [1] 626 940 The clustering method finds 626 and 940 as the two thresholds, while Cronqvist and Berg-Schlosser (2009) used the values of 550 and 850 USD to derive a multi-value causal conditions with three values: 0, 1 and 2: calibrate(LR$DEV, type = "crisp", thresholds = "550, 850")
 [1] 1 2 1 0 1 2 1 0 0 1 0 2 0 0 0 0 2 2

The argument thresholds can in fact be specified as a numerical vector such as c(550, 850), but as it will be shown in the next section, when calibrating to fuzzy sets this argument is best specified as a named vector and its most simple form is written between two double quotes. This is an improvement over the former specification of this argument, but both are accepted for backwards compatibility.

Using the recode() function gives the same results:

recode(LR$DEV, rules = "lo:550 = 0; 551:850 = 1; else = 2")  [1] 1 2 1 0 1 2 1 0 0 1 0 2 0 0 0 0 2 2 This specification of the argument rules assumes the raw data are discrete integers, but in fact the recode() function has another specification inspired by the function cut(), which works with both discrete and continous data. This other method uses a different argument named cuts (similar to the argument breaks from function cut() and also similar to the function thresholds from function calibrate(), to define the cut points where the original values will be recoded) and a related argument named values to specify the output values: recode(LR$DEV, cuts = "550, 850", values = 0:2)
 [1] 1 2 1 0 1 2 1 0 0 1 0 2 0 0 0 0 2 2
attr(,"labels")
0 1 2
0 1 2 

As mentioned, there are various dialogs in the graphical user interface to match these commands. The calibration dialog is one of the most complex from the entire user interface, and figure 4.2 shows the one appearing after selecting the menu:

Data / Calibrate:

The procedure to use this dialog is very straightforward and involves a number of intuitive steps:

1. select the dataset from the list in the Dataset area: in this example a single dataset is loaded, but R can work with any number of datasets in the same time (one thing to notice, if there is a single dataset loaded in R, it will automatically be selected by the interface)
2. select the condition from the list under the Choose condition area, with the immediate effect of the distribution of values appearing in the threshold setter area
3. choose crisp from the radio button pairing with fuzzy (this is the equivalent of the argument type in the written command)
4. if threshold values are to be suggested by the computer, check the find thresholds checkbox; it has no direct equivalent with the arguments of the calibrate() function, but it is using the findTh() function behind the scenes
5. if points are too densely clustered, check on the jitter points checkbox to scatter the points vertically with small random values
6. adjust the number of thresholds via the down or up buttons
7. whether or not asking the computer for thresholds, their values can be manually (over)written in the text boxes right above the plot area
8. independently of manual or automatic specification of the thresholds values, their correspondent vertical red bars in the plot area can be manually dragged left or right, and the text boxes from step 7 will change accordingly
9. if the calibrated values should be saved as a different variable in the same dataset, check the calibrate into new condition and specify the new (calibrated) condition name (otherwise it will overwrite the same condition with the new values)
10. click the Run button and the new column should appear in the dataset, visible either in the console or in the data editor.

Figure 4.2 presents a situation where the condition DEV from the dataset LR is calibrated to multi-value crisp sets using two thresholds (550 and 850) which are manually set (the “find thresholds” option is unchecked), with points jittered vertically to avoid overlapping.

Since the user interface is developed into a webpage, it makes sense to use all the advantages of this environment. The points have a “mouse-over” property, and respond with the label of the point (the row name of that particular case), in this example displaying EE (Estonia), a country from the dataset LR.

The dialog allows up to six thresholds for the crisp type, dividing a causal condition in at most seven groups. This is a limitation due to the lack of space in the dialog, but otherwise the command line can specify any number of thresholds. Cronqvist and Berg-Schlosser (2009) have given a fair number of practical advice in order to decide for how many thresholds should be set. Apart from the already mentioned guides (to look for naturally occurring clusters of points, and employing theoretically based decisions), one other very good advice is to avoid creating large unbalances in the group sizes, otherwise solutions will possibly be too case specific (finding solutions that explain exactly 1 or 2 cases, whereas a scientifically acceptable result should allow more general solutions, at least to some degree).

This particular distribution of data points are rather clearly scattered, but other datasets can have hundreds of overlapping points, a situation when the area dedicated for the thresholds setter will prove to be too small even if points are jittered. However, this is not a problem for an interface designed into a webpage: unlike traditional user interfaces where dialogs are fixed, this particular interface is designed to be responsive, reactive and above all interactive.

Notice the small handling sign in the bottom right corner of the dialog (all resizable dialogs have it), which can be used to enlarge the dialog’s width to virtually any dimension until the points will become more clearly scattered for the human eye. The result of such dialog enlargement can be seen in figure 4.3 below:

Many dialogs are allowed to be resized (for example the plot window), and the content inside is automatically recalculated to the new dialog dimensions. In this particular example, only the threshold setter area was redrawn and the bottom controls (including the Run button) have been repositioned. All other buttons and controls have been left to their original position.

As shown in section 2.4, each click and every action like dragging thresholds left or right, triggers a modification of the “Command constructor” dialog. From the second step where the condition is selected, the command constructor starts to display the equivalent written command which, upon the click of the Run button, will be sent to the R console. There are so many features in the graphical user interface that a thorough description of every single one of them would require too much book space and distort the user’s attention from the important topics. To mention just one such “hidden” feature, when multiple thresholds are present in the thresholds setter area, they are automatically sorted and displayed in increasing order, with a limited bounded drag range, between the minimum and maximum of the values found in the data.

The “Command constructor” dialog is refreshed on every click, with the aim to help the user construct the command itself, rather than clicking through the dialogs. This will pay off since the written commands are always better than a point-and-click approach. While users can easily forget what did they click to obtain a particular result, commands saved in dedicated script files are going to be available at anytime and this is helpful for replication purposes: it is more complicated to replicate clicks than to run a script file.

Since there are two functions that accomplish the same goal of calibrating to crisp sets, there are also two separate dialogs. The next to be presented refers to a menu which is not so much related to QCA per se but it accomplishes a more general data transformation process which is present in almost any other software:

Data / Recode

The design of this dialog is largely inspired from the recoding menu of the SPSS software, to which many of the social science users are very accustomed with. Figure 4.4 presents the same areas to select for Dataset and Choose condition: as in the calibration dialog, but otherwise it has the familiar “Old values” and “New values” sections which are found in the other software.

This dialog also has a rather straightforward procedure to use, with the same first two steps as in the calibrate dialog (to select the dataset and the condition), therefore will continue from the third step:

1. select the radio button or the relevant text box(es) and insert the threshold(s), in the Old value(s) part of the dialog
2. insert the new value or select from the other options in the New value side of the dialog
3. press the Add button to construct the rule
4. repeate steps 3 to 5 for each recoding rule
5. if the recoded values should be saved as a different variable in the same dataset, check the recode into new condition and specify the new (recoded) condition name (otherwise it will overwrite the same condition with the new values)
6. click the Run button and the new column should appear in the dataset, visible either in the console or in the data editor.

There are two additional buttons on the right side of the dialog: Remove erases any selected rule(s) from the recoding rules area, and Clear erases all rules at once. As any rule of the rules is selected, their correspondent radios and text boxes are completed with the rule values, both in the old and new parts. This allows modifications to any of the rules, and a second press on the Add button brings those modifications in the rules area.

As long as the recoding rules do not overlap (an ‘old’ value should be covered by only one of the recodings), the order of the rules doesn’t matter. But if many recoding rules cover the same old values, then precedence has the last specified rule (which overwrites recodings made by the first specified rules). As always, a few toy examples in the command line with only a handful of values will show the user how the command works for every scenario.

This section ends with the conclusion that calibration to crisp sets is essentially equivalent to recoding the initial raw causal condition with a set of discrete new values. The next section demonstrates what the “real” calibration is all about, applied to fuzzy sets. It is the main reason why the default value of the type argument has been changed to "fuzzy", despite its long lasting traditional default value of "crisp" in all previous versions of the QCA package.

## 4.2 Calibrating to fuzzy sets

Social science concepts are inherently difficult to measure. Unlike physical sciences where things can be directly observed and measured, in the social sciences things are not directly observable, hence their measurement is always challenging and often problematic. Some of the concepts from the social world are more easily observable: sex, age, race etc., but the bulk of the social science concepts are highly abstract and need substantially more effort to have them measured, or at least to create an attempt of measurement model. These are complex, multi-dimensional concepts which require the use of yet another (set of) concepts just to obtain a definition, and those concepts need a definition of their on etc.

Neuman (2003) presents an in-depth discussion about the role of concepts in social research, for both quantitative (positivist) and qualitative (interpretive) approaches. The measurement process is different: quantitative research define concepts before data collection and produces numerical, empirical information about the concepts, while in the qualitative research concepts can be produced during the data collection process itself.

Both approaches use conceptualization and operationalization in the measurement process, in a tight connection with the concept definition, although Goertz (2006b) has an ontological perspective arguing there is more about concepts than a mere definition, because researchers need to first establish what is “important” about the entity in order to arrive at a proper definition.

Concepts have a tight interconnection with theory: sometimes the concept formation process leads to new theories, while established theories always use accepted definitions of their concepts. These definitions can change depending on the theoretical perspective employed in the research process. Although not directly concerning QCA, it is a nevertheless important discussion for calibration purposes.

From yet another point of view, concepts have a cultural background just as much they have a theoretical one. This cultural dependence can happen in at least two different ways:

1. concepts have different meanings in different cultures: altruism in Hungary is probably something different from the altruism in Korea (to compare very different countries), or the well known continuum for left and right political positioning doesn’t have the same meaning in Japan, where political participation resembles very little if nothing at all with the Western concept of “participation”
2. even if concepts have the same meaning, their level of concentration can dramatically differ in different cultural and/or historical contexts, for example citizenship or public participation which has very high levels in a country like Belgium, and very low levels in a post-communist country like Romania.

### 4.2.1 Direct assignment

The method of direct assignment is the simplest possible way to obtain a (seemingly) fuzzy calibrated condition from some raw numerical data. The term “direct assignment” has been introduced by Verkuilen (2005), while something similar was briefly mentioned by Ragin (2000).

It is likely a method that is tributary to Verkuilen’s formal training in experimental psychology, where expert knowledge is both studied and employed in conjunction with various scales. In the direct assignment, the fuzzy scores are allocated by experts the way they seem fit, according to their expertise. There can be some form of theoretical justification for the various thresholds separating the fuzzy scores, but in the end this is a highly subjective method and it is likely that no two experts will reach exactly the same values.

To avoid the point of maximum ambiguity 0.5, the experts typically choose four, and sometimes even six fuzzy scores to transform the raw data into a fuzzy set. This procedure is extremely similar to the recoding operation when calibrating to crisp sets, with the only exception that the final values are not crisp, but fuzzy between 0 and 1.

To exemplify, we can recode the same condition DEV from the raw version of the Lipset data:

recode(LR$DEV, cuts = "350, 550, 850", values = "0, 0.33, 0.66, 1")  [1] 0.66 1.00 0.66 0.33 0.66 1.00 0.66 0.33 0.33 0.66 0.33 1.00 0.00 [14] 0.00 0.00 0.33 1.00 1.00 attr(,"labels") 0 0.33 0.66 1 0.00 0.33 0.66 1.00  All values between 0 and 350 are recoded to 0, the ones between 351 and 550 to 0.33, the ones between 551 and 850 to 0.66 and the rest are recoded to 1. Supposing the thresholds (in this case, the cuts) have some theoretical meaning, this is a very simple and rudimentary way to obtain a seemingly fuzzy calibrated condition. Arguably, the end result is by no means different from a calibration to crisp sets, obtaining a new condition with four levels: recode(LR$DEV, cuts = "350, 550, 850", values = "0, 1, 2, 3")
 [1] 2 3 2 1 2 3 2 1 1 2 1 3 0 0 0 1 3 3
attr(,"labels")
0 1 2 3
0 1 2 3 

Naturally, more levels and generally more multi-value conditions in a dataset expand the analysis with even more possible causal configurations, and from this point of view a fuzzy set (even one having four fuzzy categories) is preferable because it is at least confined between 0 and 1. But while fuzzy sets are notoriously averse against the middle point 0.5, the crisp sets are more than willing to accommodate it in a middle level, for instance creating a multi-value crisp set with three levels:

## 4.3 Calibrating categorical data

When introducing the concept of calibration in QCA, Ragin (2008a, 2008b) writes solely about “transforming interval-scale variables into fuzzy sets”, and the different methods to obtain the equivalent fuzzy set scores.

While Ragin offers plenty of examples but not a formal definition of the calibration process, Schneider & Wagemann (2012, 23) are more general and define calibration as the process of how:

“… set membership scores are derived from empirical and conceptual knowledge.”

This definition is so large, that it can incorporate basically anything, because empirical and conceptual knowledge is not limited strictly to interval level data, there are all sorts of knowledge being accumulated from both quantitative, and especially from the qualitative research strategies (after all, the “Q” from QCA comes from qualitative, not quantitative comparative analysis).

As we have seen, social sciences present two main classes of data: categorical (composed of nominal and ordinal variables), and numeric (interval and ratio level of measurement). Calibrating numerical data was covered extensively in the previous sections, but there is another trend to transform categorical data into fuzzy sets, and this is often called calibration as well.

The final outcome of the calibration process is a vector of fuzzy, numeric scores. It is important to underline the focus the attention on the word “numeric”, because fuzzy scores are proper numbers between 0 and 1.

With respect to nominal variables (pure, unordered categorical data), they can only be calibrated to crisp sets. It would be impossible to derive continuous, fuzzy numeric scores for categories such as “urban” and “rural”. These types of data can be transformed (“calibrated”) into either binary crisp sets, if there are only two categories, or multi-value crisp sets if there are multiple categories belonging to a certain causal condition.

Binary crisp sets are the most general type of calibrated data, and virtually any kind of raw data (qualitative and quantitative alike) can be transformed to this type of crisp sets. Whether nominal, ordinal, interval or ratio, all of them can be coerced to 1 and 0 values, characteristic for binary crisp sets.

Interval and ratio type of data have already been discussed: with proper care when choosing the inclusion thresholds, numeric raw data can be transformed into either crisp, or fuzzy data using the direct or the indirect methods of calibration.

The other level of measurement which is still open to discussion, is the ordinal level of measurement for the raw data. Ordinal variables are categorical, with a further restriction that categories must be arranged in a certain order.

Some types of ordinal variables, especially those with a very limited number of values, are also very easy to calibrate. A variable called “traffic lights” having three categories “red”, “yellow” and “green” (in that particular order, or reversed, but yellow is always in the middle) is still a very clear categorical variable which can only be calibrated to a multi-value crisp set having three values: 0, 1 and 2, where 0 could mean a complete stop of vehicle movement (“red”), 1 could mean to prepare to stop and slow down speed (“yellow”) and 2 could mean move ahead freely (“green”).

It would not make any sense, and would not serve any practical purpose to transform this type of raw ordinal variable into a pure fuzzy set having values 0, 0.5 and 1 (for many reasons, including the fact that calibration should always avoid the value of 0.5, with more details in the next chapter.)

Values 0, 1 and 2 are just as good, and one can find all sorts of necessity and sufficiency claims with respect to one of these values and a given outcome, for example producing and accident or not.

The only type of ordinal data which can potentially be confusing are the Likert-type response scales. These are also categorical, but many researchers seem to have little reservations to calculate central tendency measures such as the mean or standard deviation, typically used for numeric data only.

Although categorical, response values from the Likert type scales are often treated as if they were interval numbers. While I believe this is a mistake, it is a completely different topic than calibration. Especially for a low number of values (there are Likert type response scales with as little as 4 categories), treating the values as numbers is difficult in general, and it represents an even bigger difficulty for calibration purposes.

But there are two other, major reasons for which calibrating Likert scales is difficult. The first one is related to the bipolar nature of the Likert response scales. While fuzzy sets are unipolar, for example satisfaction, in the case of Likert response scales they are usually constructed from a negative end (1. Very unsatisfied) to a positive end (5. Very satisfied).

A mechanical transformation of a bipolar scale into a uni-dimensional set is likely to introduce serious conceptual questions, as there is no logical reason for which the end “very unsatisfied” should be treated as the zero point in the set of “satisfaction”. Of course, a very unsatisfied person is certainly outside the set of satisfied people, but so can be argued about the mid point (3. Neither, nor) on a 5 values Likert scales, where only the last two values refer to satisfaction: 4. Satisfied and 5. Very satisfied.

In set theory, satisfaction and dissatisfaction can be treated as two separate sets, rather than two ends of the same set. A person can be both satisfied and unsatisfied in the same time, despite the fact that a single response is given for a Likert type scale.

Exploring the robustness of QCA result applied to large-N data, Emmenegger, Schraff, and Walter (2014) used the European Social Survey data wave 2002/03 to study the attitudes towards immigration in Germany. They have proposed a method to transform a 5 points Likert type response scale into fuzzy set scores, using exactly this technique to consider values 1, 2 and 3 more out of the set, and values 4 and 5 more in the set (in between defining a region of indifference, as a qualitative anchor point), thus exposing the analysis to the arguments above.

A second, perhaps even more important reason for which Likert type data is difficult to be calibrated as if they were interval, is the fact that many times responses are skewed towards one of the ends. Either in situations where respondents avoid extremes (thus concentrating responses around the mid point), or in situations where people avoid the negative end (thus concentrating responses towards to positive half of the scale), it is possible to find that most responses are clustered around a certain area of the scale. Rarely, if ever, are responses uniformly distributed across all response values.

When responses are mostly clustered (skewed) around 3 values instead of five, they can easily be calibrated to a multi-value crisp set, but even with 5 evenly distributed values, it is still possible to construct a such a multi-value crisp set. More challenging are Likert type response scales with more than 5 values, most often from 7 values and up, because at this number it gets increasingly difficult for respondents to think in terms of categories, the distance between values becoming apparently equal.

Skewness is a serious issue that needs to be addressed, even for a 7 values Likert response scale. Theoretical knowledge does not help very much, first because such response scales are highly dependent on the specification of the extremes ends (for example, something like “Extremely satisfied” is surely different from “Very satisfied”), and second because there no real guarantee that a particular choice of wording has the same meaning in different cultures.

In any situation, transforming ordinal data into numeric fuzzy scores is highly problematic, and enters a challenging territory even for the quantitative research. To claim that qualitative research can do a better job than its quantitative counterpart, in creating numeric scores is questionable to the very least.

That being said, in a situation where the Likert response scale is large enough (at least 7 points), and the responses are more or less evenly distributed across all values, there might be a straightforward method to obtain fuzzy scores from these categorical response values, combining ideas from the quantitative research strategy but insufficiently explored for QCA purposes.

In a study addressing the fuzzy and relative poverty measures, Cheli and Lemmi (1995) seek to analyze poverty in a multidimensional perspective. Their work is relevant because poverty studies have to categorize respondents into poor and non-poor, which is a very similar approach to the fuzzy calibration. For this objective, they propose a method called TFR (totally fuzzy and relative) based on rank orders, thus applicable to both ordinal and interval levels of measurement.

The TFR technique uses an empirical cumulative distribution function on the observed data, and it is best suited to interval level data (a situation already covered by the function calibrate(), activating the argument ecdf = TRUE). However, when data is categorical (even skewed), they propose a normalized version by applying a simple transformation to create a membership function that outputs scores between 0 and 1.

The formula below is an adaptation of their function, restricted to values equal to 0 or above, to make sure it can never output negative values (a safety measure also employed by Verkuilen 2005):

$TFR = max\left(0, \frac{E(x) - E(1)}{1 - E(1)}\right)$

E() is the empirical cumulative distribution function of the observed data, and the formula basically calculates the distance from each CDF value to the CDF of the first value 1 in the Likert response scale, and divide that to the distance between 1 (the maximum possible fuzzy score) to the same CDF of the first value 1 in the same Likert response scale.

To demonstrate this with the R code, I will first generate an artificial sample of 100 responses on a 7 points response scale, then calibrate that to fuzzy sets using the method just described.

# generate artificial data
set.seed(12345)
values <- sample(1:7, 100, replace = TRUE)
E <- ecdf(values)
TFR <- pmax(0, (E(values) - E(1)) / (1 - E(1)))

# the same values can be obtained via the embedded method:
TFR <- calibrate(values, method = "TFR")

The object TFR contains the fuzzy values obtained via this transformation:

table(round(TFR, 3))

0 0.193 0.398  0.58 0.682 0.807     1
12    17    18    16     9    11    17 

The fuzzy values resulted from this transformation are not mechanically spaced equally between 0 and 1, because they depend on the particular distribution of the observed data. This is very helpful, giving guaranteed suitable fuzzy scores even for highly skewed data coming from ordinal scales.

## 4.4 The zoom factor

Chances are the most misunderstood part of the calibration procedure is related to how the external criteria are used.

In his excellent description of the calibration process, Mello (2021) presents three such criteria: - undisputed facts - generally accepted conceptions - individual expertise

### References

Bogin, Barry. 1998. “The Tall and the Short of It.” Discover 19 (2): 40–44.
Bolton-Smith, Caroline, Mark Woodward, Hugh Tunstall-Pedo, and Caroline Morrison. 2000. “Accuracy of the Estimated Prevalence of Obesity from Self Reported Height and Weight in an Adult Scottish Population.” Journal of Epidemiology and Community Health 54: 143–48.
Cheli, Bruno, and Achille Lemmi. 1995. “A ‘Totally’ Fuzzy and Relative Approach to the Multidimensional Analysis of Poverty.” Economic Notes 1: 115–34.
Cronqvist, Lasse, and Dirk Berg-Schlosser. 2009. Multi-Value QCA (mvQCA).” In Configurational Comparative Methods: Qualitative Comparative Analysis (QCA) and Related Techniques, edited by Benoît Rihoux and Charles Ragin, 69–86. London: Sage Publications.
Emmenegger, Patrick, Dominik Schraff, and Andre Walter. 2014. QCA, the Truth Table Analysis and Large-N Survey Data: The Benefits of Calibration and the Importance of Robustness Tests.” http://www.compasss.org/wpseries/EmmeneggerSchraffWalter2014.pdf.
———. 2006b. Social Science Concepts. A User’s Guide. Princeton; Oxford: Princeton University Press.
Lipset, Martin Seymour. 1959. “Some Social Requisites of Democracy: Economic Development and Political Legitimacy.” American Political Science Review 53 (1): 69–105.
Mello, Patrick. 2021. Qualitative Comparative Analysis. An Introduction to Research Design and Application. Washington, DC: Georgetown University Press.
Neuman, Lawrence W. 2003. Social Research Methods. Qualitative and Quantitative Approaches. 5th ed. Boston: Allyn Bacon.
Persico, Nicola, Andrew Postlewaite, and Dan Silverman. 2004. “The Effect of Adolescent Experience on Labor Market Outcomes: The Case of Height.” Journal of Political Economy 112 (5): 1019–53.
———. 2000. Fuzzy Set Social Science. Chicago; London: University of Chicago Press.
———. 2008a. “Measurement Versus Calibration: A Set Theoretic Approach.” In The Oxford Handbook of Political Methodology, edited by Janet Box-Steffensmeier, Henry E. Brady, and David Collier, 174–98. Oxford: Oxford University Press.
———. 2008b. Redesigning Social Inquiry. Fuzzy Sets and Beyond. Chicago; London: University of Chicago Press.
Rihoux, Benoît, and Gisèle De Meur. 2009. “Crisp-Set Qualitative Comparative Analysis (csQCA).” In Configurational Comparative Methods: Qualitative Comparative Analysis (QCA) and Related Techniques, edited by Benoît Rihoux and Charles Ragin, 33–68. London: Sage Publications.
Royston, Patrick, and Douglas G. Altman. 1994. “Regression Using Fractional Polynomials of Continuous Covariates: Parsimonious Parametric Modelling.” Journal of the Royal Statistical Society. Series C 43 (3): 429–67.
Sauerbrei, William, and Patrick Royston. 1999. “Building Multivariable Prognosticand Diagnostic Models: Transformation of the Predictors by Using Fractional Polynomials.” Journal of the Royal Statistical Society. Series A 162 (1): 71–94.
Schneider, Carsten, and Claudius Wagemann. 2012. Set-Theoretic Methods for the Social Sciences. A Guide to Qualitative Comparative Analysis. Cambridge: Cambridge University Press.
Smithson, Michael, and Jay Verkuilen. 2006. Fuzzy Set Theory. Applications in the Social Sciences. Thousand Oaks: Sage.
Thiem, Alrik. 2014. Membership Function Sensitivity of Descriptive Statistics in Fuzzy-Set Relations. International Journal of Social Research Methodology 17 (6): 625–42.
Thiem, Alrik, and Adrian Dușa. 2013. Qualitative Comparative Analysis with R. A User’s Guide. New York; Heidelberg; Dordrecht; London: Springer.
Verkuilen, Jay. 2005. “Assigning Membership in a Fuzzy Set Analysis.” Sociological Methods and Research 33 (4): 462–96. https://doi.org/10.1177/0049124105274498.

1. this term should not be confused with the log odds in the logistic regression (aka “logit”), that is the natural logarithm of the “odds ratio”.↩︎