Chapter 1 Summary of NCA
NCA is based on necessity causal logic. If a certain level of the condition is not present, a certain level of the outcome will not be present. Other factors cannot compensate for the missing condition. The necessary condition allows the outcome to exist, but does not produce it. This is different from sufficiency causal logic where the condition produces the outcome.
Conventional theories and methods are based on additive logic. This logic assumes that several factors contribute to the outcome and can compensate for each other. For example, conventional quantitative methods like multiple regression analysis and structural equation modeling describe the complexity of different contributing factors producing the outcome. In these models a low value of one factor can be compensated by changing another factor. When making causal interpretations with these models, factors are usually interpreted as ‘generic causes’ that can change the probability of the outcome.
Conventional quantitative methods focus on generic causes and do not cover necessity logic and are thus not able to identify necessary conditions in data sets. This was the main reason for developing NCA. NCA ensures theory-method fit when the theory includes necessity relations between factors and outcome and these relationships need to be evaluated empirically.
Conducting NCA consists of four stages:
Formulate the necessary condition hypothesis.
Collect the data.
Analyse the data.
Report the results.
Each step is explained below in more detail in sections 1.3, 1.4, 1.5, and 1.6.
In research projects and publications, NCA can be used as a stand-alone method or be used together with conventional methods such as multiple regression analysis, structural equation model, or QCA. The decision to apply NCA as a stand-alone method or a complementary method depends on the goal of the research.
1.1 NCA as stand-alone method
A researcher may have two reasons to use NCA as a stand-alone method in a particular study. First, the researcher may want to employ just a necessity view on the phenomenon of interest. He may have formulated a parsimonious ‘pure necessity theory’ (see Chapter 2) consisting of one or more necessity relations (e.g., Karwowski et al., 2016; Knol et al., 2018). To ensure theory-method fit, NCA is used for testing pure necessity theories. A regression-based method is not appropriate for this purpose. [Similarly, when a researcher wants to test a theory consisting only of generic causal relations (most current theories), a conventional probability-based method such as regression analysis should be selected and NCA is not appropriate]. A second reason for using NCA as the stand-alone method is that the researcher may want to add a necessity view to an existing generic causal theory from the literature. Without (re)testing the generic causal relationships, she may want to test whether some concepts of the theory are (also) necessary conditions.
The advantage of using NCA as a stand-alone method is that the study and its theoretical reasoning can focus on the single necessary concepts. There is no need to include other concepts (e.g., contributing factors, control variables) into the reasoning and analysis. This allows a clear story line and efficient data collection and analysis.
1.2 NCA as a complementary method
A researcher may also have reasons to use NCA in combination with other methods in a single study. The researcher may want to add a necessity view to a generic causal theory and to analyse necessity and generic relationships in combination. This can be done in two ways. First, the generic causal theory is leading and some concepts from this theory are tested for necessity. Second a combined necessity-generial causal theory is developed with both necessity and generic causal relationships between concepts.
In the first option, an existing or new generic causal theory has concepts that are considered being contributing factors that can change the probability of the outcome (including control variables). From these concepts, potential necessary conditions are selected. The necessity and generic causal relationships of these concepts with the outcome are tested with the appropriate methods and the methods are conducted successively. The integration occurs when the results are discussed. Several NCA multimethod studies that start with a generic causal theory have been reported in the literature. For example, when NCA is used in combination with (multiple) regression (e.g., Jain et al., 2022; Klimas et al., 2022; Stek & Schiele, 2021) or structural equation modeling (e.g., Della Corte et al., 2021; W. Lee & Jeong, 2021; Renner et al., 2022; Richter, Schubring, et al., 2020) “important” factors are identified with a regression-based method, and the necessity of these and other factors is analysed with NCA for identifying whether the factors are necessary or not (see section 4.4). When NCA is used in combination with Qualitative Comparative Analysis- QCA (C. S. Kopplin & Rösch, 2021; e.g., Torres & Godinho, 2022) the results of NCA are compared to the sufficient configurations that are identified by QCA (see section 4.5).
In the second option, the researcher starts the study with a combined necessity and generic causal theory. This is a complex ‘embedded necessity theory’ (see Chapter 2) that consists of both necessity and generic causal relations from the start. Some concepts may only have a generic causal but not a necessity relationship with the outcome; other concepts may be a necessary cause but not a generic cause, and yet others may be both a necessary cause and a generic cause. Embedded necessity theories that combine necessity and generic causal theorizing for describing relations are still rare. In QCA, most studies theorize from the perspective of sufficiency and test the potential necessity of the single factors, but do not include necessity theorizing from the start. Also, most studies that combine NCA with multiple regression or structural equation modeling do not theorize about necessity and generic causation in combination (for an exception see Dul, 2019). Testing embedded necessity theories requires a multimethod approach with both regression-based method and a necessity-based method.
When combining NCA with regression analysis or QCA, the order of conducting NCA and the other method is not relevant. When NCA is used in combination with structural equation modeling (SEM) the order matters. First SEM is applied followed by NCA. The reason is that the outcome of the SEM measurement model is used to define the constructs to be tested for necessity with NCA.
1.3 Formulate the necessary condition hypothesis
NCA starts with a theoretical notion that a necessity relation may exist between \(X\) (the potential condition) and \(Y\) (the outcome). This is usually done by formulating and justifying a hypothesis that is part of a theory (see Chapter 2).
NCA is mainly used in theory-testing research, which starts with formulating the theory and proceeds with empirical testing the theory with data. Hence, theory formulation comes before data collection and analysis. This is also the focus in this book. However, NCA can also be used in theory-building and exploratory research. Then formulating the theory is based on the results of the data analysis, and thus comes after it (e.g., Stek & Schiele, 2021).
1.4 Collect the data
Collecting data in NCA is not different from collecting data in general. The goal of data collection is to have scores (values, levels) for the condition \(X\) and the outcome \(Y\) for each case. The selected research design (e.g., ‘experiment’, ‘survey’, ‘case study’) must meet common quality standards. Also the selection of cases for measurement and data analysis must fit the goal of the research (e.g., random sampling, purposive sampling for specific reasons). The data must be ‘good’. This means that the data must be valid (the measurement scores reflect what they are intended to reflect) and reliable (when measurement is repeated, the results are the same).
NCA has no new requirements on collecting the data. There are a few exceptions. First, the setup of a ‘necessity experiment’ is different than the setup of a ‘sufficiency experiment’ or a common ‘average treatment effect experiment’ (see section 3.2). Second, in certain situations it is possible to sample just a single case to perform NCA (see section 3.3). Third, the way that NCA is conducted may differ depending on the types of data that are used: quantitative data (section 3.4), qualitative data (section 3.5), longitudinal data (section 3.6), and set membership scores (section 3.7). Furthermore, the identification of potential outliers is partly different from the common way of identifying outliers (see section 3.8).
1.5 Analyse the data
Data analysis is at the core of NCA. NCA’s data analysis assumes that it makes theoretically sense to analyse the data with necessity logic (see section 1.3) and that the data are meaningful (see section 1.4). For a quantitative necessary condition analysis the ‘scatter plot’ approach can be used. The scatter plot maps cases in the XY plane and NCA conducts a bivariate analysis on the scatter plot. Figure 1.1 shows an example of a scatter plot with \(X\) = Contractual detail and \(Y\) = Innovation of 48 buyer-supplier relationships for evaluating the hypothesis that Contractual detail is necessary for Innovation (Van der Valk et al., 2016).
According to this hypothesis it is not possible to have cases with low level of Contractual detail (\(X\)) and high level of Innovation (\(Y\)). This means that the upper left corner of the scatter plot remains empty. The space without cases is called the empty space or ceiling zone. NCA draws a border line called ceiling line between the space without cases and the space with cases. Two default ceiling lines are the Ceiling Envelopment - Free Disposal Hull (CE-FDH), which is a step function that can be used when \(X\) or \(Y\) are discrete with a limited number of levels or when the border is irregular, and the Ceiling Regression - Free Disposal Hull (CR-FDH), which is a straight trend line through the upper left corner points of the CE-FDH line. When a few cases are present in the otherwise empty space the ceiling line is not entirely accurate. The ceiling-accuracy (c-accuracy) is the percentage of cases on or below the ceiling line. By definition the CE-FDH line is 100% accurate and the CR-FDH line is usually not 100% accurate. A low c-accuracy indicates that the ceiling line may not properly represent the border between empty and full space, and another ceiling line may be selected. The scope (S) is the area of the total space where cases can appear given the minimum and maximum possible values of \(X\) and \(Y\). The effect size (d) is the area of the ceiling zone (C) divided by the scope: d = C/S. The effect size can have values between 0 and 1. NCA estimates the ceiling line and its effect size from the sampled data. The statistical test of NCA consists of estimating the p value of the effect size. With the NCA software the NCA parameters can be computed.
The remaining part of this section demonstrates NCA’s data analysis by using the NCA software.
1.5.1 Prepare the analysis
For performing a (quantitative) NCA the NCA software can be used. This is a free package in R and the researcher must have access to R and RStudio. Researchers who are not familiar with R and RStudio can consult the NCA Quick Start Guide on the NCA website (https://www.erim.eur.nl). This guide explains how R and RStudio can be downloaded from internet and gives further details about the software. This demonstration uses the RStudio interface for conducting the analysis. After R and RStudio are installed and RStudio is opened, the script window is used to type and run the instructions. The first instructions for this demonstration are as follows:
#Demonstration NCA
#Install and load the NCA package
install.packages("NCA") # to install the NCA package (only ones)
library (NCA) # load the NCA package (for each new session)
The script starts with a remark after the hashtag (#), followed by instructions to install (download) the NCA package with the install.packages
function. Installing the NCA package must be done once. Afterwards, for each new NCA session, the NCA package must be loaded (activated) using the library
function.
1.5.2 Load the data
Next, in this demonstration three datasets are loaded. A dataset must be organised such that rows are cases and columns are variables (condition(s) and outcome(s)). Often, data files have the .csv extension but also other data file formats can be loaded, for example .xls (Excel), .sav (SPSS) or .dta (Stata). The researcher may conduct NCA on new or existing datasets, including archived datasets that are publicly available.
#Load the data
#Data on own computer:
data1 <- read.csv("myData.csv", row.names = 1) # load and rename my dataset
#Data on internet (example Worldbank):
install.packages("WDI") #install package for loading Worldbank data
library(WDI)
data2 <- WDI (indicator = c("SH.IMM.IDPT","SP.DYN.LE00.IN"))
#Data in the NCA software:
data(nca.example2)
data3 <- nca.example2 #load the example data from the NCA package
The first dataset that is loaded is a .csv dataset. The dataset is assumed to be stored in the working directory on the user’s computer. After loading the dataset in R it is renamed as ‘data1’.
The second dataset is obtained from the internet. It is a dataset from the Worldbank website, which can be extracted with the WDI
package. The first variable (indicator) that is extracted refers to the vaccination level of a country and the second to the country’s life expectancy. The vaccination level is the percentage of children aged 12-23 months who received vaccination against diphtheria, pertussis (or whooping cough), and tetanus (DPT). Life expectancy is the number of years a newborn infant would live if prevailing patterns of mortality at the time of its birth were to stay the same throughout its life.
The third dataset is part of the NCA package from version 3.1.2. and has data on 48 buyer-supplier relationships as the rows. This dataset has three conditions (Contractual detail, Goodwill trust, and Competence trust) and one outcome (Innovation). The dataset is published as Table 2 in an article entitled “When are contracts and trust are necessary for innovation in buyer-supplier relationships? A necessary condition analysis” by Van der Valk et al. (2016).
The instruction head(data3)
shows the top rows of the last dataset. The first row has the variable names and the first column the names of the cases.
## Innovation Contractual detail Goodwill trust Competence trust
## 1 3.57 3.24 2.71 4.0
## 2 3.57 2.71 2.43 3.0
## 3 1.29 2.29 4.00 4.0
## 4 2.14 4.14 3.71 4.0
## 5 1.00 2.43 3.29 3.5
## 6 3.43 1.86 3.86 4.5
1.5.3 Estimate effect size and the p value with nca_analysis
After the data are loaded the NCA analysis starts with estimating the size of the space that is expected to be empty given the hypothesis. By default, the software assumes that the expected empty space is in the upper left corner, representing the hypothesis that the presence or a high value of \(X\) is necessary for the presence of a high value of \(Y\). The software can also analyse other corners depending on whether the presence or absence of \(X\) is necessary for the presence or absence of \(Y\) (see section 1.3) by using the corner
argument in nca_analysis
. By default the software uses the CE-FDH and CR-FDH ceiling techniques to draw the ceiling line and to calculate the effect size. Other ceiling techniques could be selected as well using the ceilings
argument in the nca_analysis
function (see section 6.3.3).
The estimation of the effect size for data3 is done with the function nca_analysis
. The first analysis with the name ‘model 1’ evaluates the hypothesis that Contractual detail is necessary for Innovation:
The nca_analysis
instruction specifies first the dataset and then the names of the condition and the outcome. After running this instruction the analysis is done, but the output is not yet shown.
A short summary of the results can be obtained by calling model1
. This prints the effect sizes of the condition for the two default ceiling lines in the console window of RStudio.
##
## --------------------------------------------------------------------------------
## ce_fdh cr_fdh
## Contractual detail 0.24 0.19
## --------------------------------------------------------------------------------
The second analysis (‘model2’) shows that the condition and outcome can also be specified by their column numbers in the dataset. The results are the same.
##
## --------------------------------------------------------------------------------
## ce_fdh cr_fdh
## Contractual detail 0.237 0.188
## --------------------------------------------------------------------------------
The third analysis (‘model3’) shows that multiple bivariate analyses can be done in a single run.
##
## --------------------------------------------------------------------------------
## ce_fdh cr_fdh
## Contractual detail 0.237 0.188
## Goodwill trust 0.307 0.255
## Competence trust 0.321 0.214
## --------------------------------------------------------------------------------
In this example three conditions are included each representing a different necessary condition hypothesis. An analysis with multiple conditions in a single run is always done with one outcome.
The fourth analysis (‘model4’) shows that the ceiling line can be selected. In this example the CR-FDH line is selected.
#Conduct NCA with selected ceiling line
model4 <- nca_analysis(data3,2:4,1, ceilings = 'cr_fdh')
model4
##
## --------------------------------------------------------------------------------
## cr_fdh
## Contractual detail 0.19
## Goodwill trust 0.26
## Competence trust 0.21
## --------------------------------------------------------------------------------
The fifth analysis (‘model5’) also includes NCA’s statistical test to calculate the p value. An empty space could be a random result of variables that are actually unrelated. The p value protects the researcher from concluding that the empty space is empty because of necessity, whereas actually it is a random result of unrelated variables. The statistical test can be activated by specifying the number of permutations for the estimation of the p value in the test.rep
argument of nca_analysis
as follows.
## Preparing the analysis, this might take a few seconds...
## Do test for : ce_fdh - Contractual detailDo test for : cr_fdh - Contractual detailDo test for : ce_fdh - Goodwill trustDo test for : cr_fdh - Goodwill trustDo test for : ce_fdh - Competence trustDo test for : cr_fdh - Competence trust
##
## --------------------------------------------------------------------------------
## ce_fdh p cr_fdh p
## Contractual detail 0.24 0.008 0.19 0.009
## Goodwill trust 0.31 0.002 0.26 0.004
## Competence trust 0.32 0.002 0.21 0.008
## --------------------------------------------------------------------------------
The selection of the number of permutations is a balancing act between p value accuracy and computation time. The summary output now includes the p value for each effect size.
1.5.4 Create output with nca_output
More details of the results can be obtained with the nca_output
function. The output is shown for analysis model6 that only includes Contractual detail and the CR-FDH ceiling line.
## Preparing the analysis, this might take a few seconds...
## Do test for : cr_fdh - Contractual detail
##
## --------------------------------------------------------------------------------
## --------------------------------------------------------------------------------
##
## Number of observations 48
## Scope 15.40
## Xmin 1.86
## Xmax 5.71
## Ymin 1.00
## Ymax 5.00
##
## cr_fdh
## Ceiling zone 2.893
## Effect size 0.188
## # above 2
## c-accuracy 95.8%
## Fit 79.2%
## p-value 0.009
## p-accuracy 0.002
##
## Slope 0.544
## Intercept 2.214
## Abs. ineff. 9.614
## Rel. ineff. 62.428
## Condition ineff. 15.298
## Outcome ineff. 55.642
The output first displays descriptive information about the sample such as number of cases (‘observations’), the scope, and the observed extreme values of \(X\) and \(Y\). Next, the NCA parameters for the selected ceiling line are displayed including effect size, number of cases above the ceiling line, and corresponding ceiling line accuracy. The fit measure is an indication how well the ceiling line represents the border line between the space with and without cases. In this example the fit of the CR-FDH ceiling line is 79.2% (maximum fit is 100%), suggesting that the ceiling line is not very regular. Next, the p value and the accuracy of the estimated p value, which depends on the selected number of permutations in nca_analysis
, are displayed. If the ceiling line is a straight line, the output also gives the slope and intercept of the ceiling line. The final four parameters are related to ‘inefficiency’, which is discussed in section 5.1.
Scatter plots can be displayed by adding the argument plots = TRUE
or plotly = TRUE
in the nca_output
function. The regular scatter plot of ‘model 1’ can be obtained as follows:
The regular scatter plot shown in Figure 1.1 is displayed in the Plots window of RStudio.
An interactive scatter plot (‘plotly’) is displayed in the Viewer window of RStudio as follows:
The non-interactive version of this plot is shown in Figure 1.2 and the interactive version can be approached here.
During inspection of the scatter plots, the researcher may observe potential outlier cases that may have a large effect on the estimated parameters. The NCA software from version 4.0.0 includes the possibility to detect and evaluate potential outliers (see section 3.8).
The scatter plots can be graphically adapted in different ways as described in section 7.3.
The results of all nca_output
can be saved as a pdf file with the pdf = TRUE
argument in the nca_output
function.
1.5.5 Perform the bottleneck analysis with nca_output
After the effect size and its p value are computed and the scatter plots are inspected the researcher can make a judgment whether or not a necessary condition in kind (\(X\) is necessary for \(Y\)) has been identified. A common reasoning for not having identified a necessary condition is that theoretical support is lacking (no theoretically reasonable hypothesis can be formulated), or the effect size is practically irrelevant (e.g., d < 0.1), or that the empty space is likely a random result of two unrelated variables (e.g., p > 0.05). After the three conditions are satisfied (theoretical support, large effect size, small p value), the researcher may judge that there is empirical support in the present study for a necessity relationship. Then the researcher can start to conduct a necessary condition analysis in degree with the conditions that meet the three criteria. This analysis can be done with the bottleneck table. In this demonstration the three conditions and the CR-FDH ceiling line are selected for the bottleneck table analysis. The values of \(X\) and \(Y\) in the bottleneck table are ‘percentages of the range’. This means that 100 corresponds to the maximum value, 0 to the minimum value, 50% to the middle value, etc.
#Show the bottleneck table
model7 <- nca_analysis(data3, 2:4, 1, ceilings = 'cr_fdh')
nca_output(model7, bottlenecks = TRUE, summaries = FALSE)
##
## --------------------------------------------------------------------------------
## --------------------------------------------------------------------------------
## Y 1 2 3
## 0 NN NN NN
## 10 NN NN NN
## 20 NN NN NN
## 30 NN NN NN
## 40 NN NN NN
## 50 NN NN NN
## 60 8.3 2.2 NN
## 70 27.4 35.2 26.0
## 80 46.5 68.2 54.0
## 90 65.6 NA 81.9
## 100 84.7 NA NA
To show only the bottleneck table, the nca_output
function includes the argument summaries = FALSE
to suppress the text output.
The first column of the bottleneck table is a set of values of the outcome Y, and the next columns show the corresponding required levels of the conditions. The table can be read row by row to find the levels of the conditions that are required for a given level of the outcome. NN means that the condition is not necessary for the corresponding level of the outcome. In the default bottleneck \(X\) and \(Y\) values are expressed as percentages of the range. As discussed in section 4.3, the values can also be expressed as actual values, percentiles and percentage of maximum, which allows useful interpretations of the bottleneck table. This section also explains why there can be NA’s in the bottleneck table. In this example there are a few NA’s, which could be replaced by the highest observed value of 5 using the argument cutoff = 1
in the nca_analysis
function.
1.6 Report the results
Two types of NCA reports exist: methodological reports and application reports. In methodological reports, the method or part of the method is introduced in a specific field, often illustrated with examples from that field (Dul, Karwowski, et al., 2020; Dul, Hauff, et al., 2021; Hauff et al., 2021; Richter & Hauff, 2022; Tóth et al., 2019; Tynan et al., 2020). Introducing the NCA method in a new specific field has particularly an added value if it can be explained:
Why necessity logic is particularly useful for the field in general or for specific topics and challenges (e.g., a list of potential topics/challenges that can benefit from NCA).
That necessity thinking already (often implicitly) exists in the literature of the field (e.g., a list of necessity statements from the field).
How NCA is different from conventional methods (e.g., by explaining the steps of NCA –theory, data, data analysis, interpretation–, by referring to an example from the field).
How NCA and can give new insights to the field (e.g., an example of applying NCA to a topic from the field).
In NCA application reports the focus is on better understanding a specific phenomenon, and necessity logic and NCA are used to serve that goal. The NCA-specific parts of these application publications are (for details see Dul, 2020):
Introduction/theory: introduction of necessity logic/theory.
Methods: Description of NCA’s data analysis approach.
Results: Presentation of scatter plots and NCA parameters (e.g., effect size, p value).
Discussion: Description of the importance of including identified necessity in theory and practice (e.g., to avoid failure of the outcome or waste of efforts).
In application publications NCA can be used as a stand-alone method (section 1.1) or in combination with other methods (section 1.2).
Although NCA has already been broadly applied, there are still many fields where the NCA method has not been introduced and applied. Most likely, NCA could be applied in any of the 249 research categories defined by the Journal of Citation Reports. But currently, only about a quarter is covered, with the largest number of publications in the two categories ‘management’ and ‘business’.
1.7 Basic guidelines for good NCA practice
The guidelines are also published in Dul (2023) and Dul et al. (2023).
Based on this summary and other publications about the NCA methodology (Dul, 2016b, 2020, 2023; Dul, Van der Laan, et al., 2020), basic guidelines for good NCA practice can be formulated (Table 1.1). These general guidelines can support researchers and reviewers of research in conducting, reporting and evaluating NCA studies, and can help to ensure good quality of NCA applications.
Topic | Chapter |
---|---|
Theoretical justification | |
Explain why X can be necessary for Y. | Ch. 2 |
Formulate the relationship between X and Y in terms of a necessity hypothesis (e.g., in terms of “X is necessary for Y”). | |
In explorative research, theoretically justify a necessary condition that is found ex post. | |
Meaningful data | |
Use a good sample. | Ch. 3 |
Use valid and reliable scores of X and Y (e.g., by using common approaches for evaluation of validity and reliability). | |
Scatter plot | |
Show the scatter plot (or contingency table) of all conditions that are evaluated for necessity. | Ch. 4 |
Visually inspect the scatter plot (e.g., pattern of the border, potential outliers). | |
Ceiling line | |
Select the ceiling(s) based on the number of levels of X and Y and the expected or visually observed (non-)linearity of the border. | Ch. 4 |
Only show the selected ceiling lines in the scatter plot. Do not show the two (default) ceiling lines if these are not selected for the analysis. | |
Effect size | |
Report the estimated effect size. | Ch. 4 |
Evaluate the practical relevance of the effect size (e.g., threshold level > 0.1). | |
Statistical test | |
Report the estimated p-value. | Ch. 6 |
Evaluate the statistical relevance of the effect size (e.g., threshold level < 0.05). | |
Bottleneck analysis (necessity in degree) | |
Present the bottleneck table for the non-rejected necessary conditions. | Ch. 4 |
Decide how to present the bottleneck table (e.g., using percentage of range, actual values, or percentiles). | |
Descriptions of NCA | |
Refer to NCA as a method (including logic/theory, data analysis and statistical testing), not just as a statistical tool or data analysis technique. | |
Acknowledge that: | |
• NCA’s necessity analysis differs from fsQCA’s necessity analysis. | |
• NCA differs from a “moderation analysis” in regression analysis. | |
• NCA is not a robustness test for other methods. | |
Properly describe elements of NCA, e.g.: | |
• Use only necessity wordings to describe the necessity relationship between X and Y (avoid imprecise or general words like (cor)related, associated, and incorrect sufficiency-based words like produce, explain). | |
• Refer to NCA’s statistical test as a permutation test (avoid incorrect descriptions like bootstrapping, (Monte Carlo) simulation, robustness check, or t-test). | |
• Use the name ‘multiple NCA’ or ‘multiple bivariate NCA’ rather than ‘multivariate NCA’ when several conditions are analysed in one run. |