Chapter 6 NCA’s statistical backgrounds

NCA’s statistical test is one of the three parts of NCA. The other parts are the use of necessity logic for formulating the causal assumptions (hypotheses), and the data analysis to calculate the NCA parameters (such as effect size). NCA’s statistical test is meant to evaluate these parameters, in particular the probability that the observed effect size could have resulted from variables that are unrelated.

6.1 NCA’s statistical expression

Based on NCA’s mathematical backgrounds (see Chapter 5), NCA employs a bivariate statistical data analysis for each condition separately (multiple bivariate analysis). The statistical model of NCA can be expressed by an equality equation as

\[\begin{equation} \tag{6.1}Y = f(X_i) - \epsilon_{X_i} \end{equation}\]

Here the function \(f(X_i)\) is the ceiling line of the i-th condition in the \(X_iY\) plane and \(\epsilon_{X_i}\) is a random variable that takes on non-negative values only.The NCA model differs from a regression model. The regression model is expressed as

\[\begin{equation} \tag{6.2}Y = f(X) + \epsilon_X \end{equation}\]

Here the function \(f(X)\) has additive terms consisting of (combinations of) \(X\)’s with coefficients. In both the NCA model and the regression models no errors of measurement in variables \(X\) and \(Y\) are assumed. The error term \(\epsilon_X\) in regression is assumed independent of \(X\) and its average value is zero. Therefore, traditional regression techniques are not appropriate for modeling necessity relationships.

NCA is a non-parametric approach that does not require assumptions about the distributions of \(X\), \(Y\) and \(\epsilon_X\). NCA only specifies a part of the Data Generation Process (DGP): the ceiling function but does not specify how the data below the ceiling are generated.

For estimating the ceiling line and effect size with empirical data for \(X\) and \(Y\), two default estimation techniques are available. The first default technique is Ceiling Envelopment - Free Disposal Hull (CE-FDH), which is a local support boundary curve estimation techniques. The free disposal hull (FDH) constructs a space (the hull) encompassing the observations. The FDH is the smallest space encompassing the data points that has the free disposal property. A space is said to have the free disposal property (e.g., Simar & Wilson, 2008), if having a particular point implies that it also includes the points at the lower right of that point. The boundary of the FDH hull is a non-decreasing step function, which can serve as an estimate of the ceiling line, in particular when \(X\) or \(Y\) are discrete, or when the ceiling line is not straight.

The second default technique is Ceiling Regression - Free Disposal Hull (CR-FDH), which is a global support boundary, frontier function and curve estimation techniques. The CR-FDH ceiling line is a straight trend line through upper left points of the CE-FDH line using OLS regression. Consequently, the CR-FDH ceiling line has some cases in the otherwise empty space. Therefore, in contrast to CE-FDH, the ceiling accuracy of CR-FDH is usually below 100 percent.

In addition to the two default ceiling lines, another straight line ceiling technique is Ceiling - Linear Programming (C-LP). This estimation approach uses linear programming; a technique that optimizes a linear goal function under a number of linear constraints. When applied to NCA the technique draws a straight line through two corner points of CE-FDH such that the area under the line is minimized.

6.2 The quality of NCA’s ceiling lines

6.2.1 Statistical quality criteria

When analysing sample data for statistical inference, the calculated ‘statistic’ (in NCA the ceiling line and its effect size) is an estimator of the ‘true’ parameter in the population. An estimator is considered to be a good estimator when for finite (small) samples the estimator is unbiased and efficient, and for infinite (large) samples the estimator is consistent. Unbiasedness indicates that the mean of the estimator for repeated samples corresponds to the true parameter value; efficiency indicates that the variance of the estimator for repeated samples is small compared to other estimators of the population parameter. Consistency indicates that the mean of the estimator approaches the true value of the parameter and that the variance approaches zero when the sample size increases. When the latter applies, the estimator is called asymptotically unbiased, even if the estimator is biased in finite (small) samples. An estimator is asymptotically efficient if the estimate converges relatively fast to the true population value when the sample size increases.

There are two ways to determine the unbiasdness, efficiency and consistency of estimators: analytically and by simulation. In the analytic approach the properties are mathematically derived. This is possible when certain assumptions are made. For example the OLS regression estimator of a linear relationship in the population is ‘BLUE’ (Best Linear Unbiased Estimator) when the Gauss-Markov assumptions hold (the dependent variable is unbounded, homoskedasticity, error term unrelated to predictors, etc.).

In the simulation approach a Monte Carlo simulation is performed in which first the true parameters (of variables and distributions) in the population are defined, then repeated random samples are drawn from this population, and next the estimator is calculated for each sample. Finally, the sampling distribution of the estimator is determined and compared with the true parameter value in the population.

For both approaches the following ‘ideal’ situation is usually assumed: the population is infinite, the samples are drawn randomly, the distributions of the variable or estimates are known (e.g. normal distribution), and there is no measurement error. Without these assumptions the analysis and simulations get too complex.

For NCA, no analytical approach exists (yet) to determine bias, efficiency and consistency of its estimators (ceiling lines and their effect sizes) so currently NCA relies on simulations to evaluate the quality of its estimates.

6.3 Simulation results for NCA’s effect size

Monte Carlo simulation is used for estimating the true effect size in a population using the three types of ceiling line (CE-FDH, CR-FDH and C-LP). Variable X has a necessity relationship with Y, represented by a straight ceiling line (Y = 0.4 + X). The corresponding true effect size is 0.18.

In Monte Carlo simulation the full ‘Data Generation Process’ (DGP) needs to be specified. This means that also assumptions must be made about the distribution of the data under the ceiling line. In this simulation we assume a uniform distribution under the ceiling line, normalized for the vertical distance under the ceiling line. Additional sumulations have been done with different distributions (Massu et al., 2020) but the results are not reported here. The simulation is done with 100 resamples per sample size. The seven sample size vary between 20 and 5000. This is repeated for each type of ceiling line.

Monte Carlo simulation results for the effect size of three ceiling lines.True ceiling line = $Y = 0.4 + X$. True effect size is 0.18.

Figure 6.1: Monte Carlo simulation results for the effect size of three ceiling lines.True ceiling line = \(Y = 0.4 + X\). True effect size is 0.18.

Figure 6.1 suggest that for small samples the three ceiling line are upward biased (the true effect size is smaller). When sample size increases, the estimated effect size approaches the true effect size (asymptotically unbiased) and the variance approaches zero. This means that the three estimators are consistent. C-LP seems more efficient (less variation). These results only apply to the investigated conditions of a true ceiling line that is straight and there is no measurement error. Therefore, only when the true ceiling line is straight and when there is no measurement error the C-LP line may be prefered. Such ‘ideal’ circumstance may apply in simulation studies, but seldom in reality. When the true ceiling line is straight but cases have measurement error the CR-FDH line may perform better because measurement error usually reduces the effect size (ceiling line moves upwards). When the true ceiling line is not a straight line, CE-FDH may perform better because this this line can better follow the non-linearity of a the border (see Figure 3.1). Further simulations are needed to clarify the different statistical properties of the ceiling lines under different circumstances. In these simulations the effects of different population parameters (effect size, ceiling slope, ceiling intercept, non-linearity of the ceiling, distribution under the ceiling), and of measurement error (error-in-variable models) on the quality of the estimations could be studied.

6.4 Interpretation of NCA’s statistical test and p-value

NCA’s statistical test is a null hypothesis test that estimates a p-value for the effect size using a permutation approach (Dul (2020); Dul, Van der Laan, et al. (2020)). The test is a part of an entire NCA method consisting of (1) formulating necessity theory, (2) calculating necessity effect size (relative size of empty space in \(XY\) plot) and (3) performing a statistical test. Within this context the test has a specific purpose of testing the randomness of the empty space when \(X\) and \(Y\) are unrelated. Therefore, a test result of a large p-value (e.g., p > 0.05) means that the empty space is compatible with randomness of unrelated variables. A test result of a small p-value (e.g., p < 0.05) means that the empty space is not compatible with randomness of unrelated variables. However, falsifying H0 does not mean that H1 is accepted.The goal of a null-hypothesis test is to test the null (H0) not a specific alternative (H1). The p-value is only defined when the null is true. Mathematical proofs and simulations have shown (Dul, Van der Laan, et al., 2020) that in this case the estimated p-value of NCA’ statistical test is valid. Indeed, the validity of the permutation test on which NCA is based is a theorem (Hoeffding, 1952; Kennedy, 1995; Lehmann et al., 2005). However, the p-value is not defined when a specific H1 is true. Therefore, the p-value is not informative about which H1 applies. This is a general charactetistic of the p-value that is often misunderstood. “A widespread misconception … is that rejecting H0 allows for accepting a specific H1.…This is what most practicing researchers do in practice when they reject H0 and argue for their specific H1 in turn” (Szucs & Ioannidis, 2017, p. p8).

NCA’s statistical test is only a part of the entire NCA method. With a high p-value, the tests accepts the null, and rejects any H1 (including necessity). With a low p-value the test rejects the null, but does not accept any specific H1, thus also not necessity. It depends on the results of the entire NCA method (including theory and effect size) and of the researcher’s judgment whether or not necessity is plausible, considering all evidence. Therefore, NCA’s statistical test result of a low p-value is a ‘necessary’ but not ’sufficient condition’ for concluding that an empty space be considered to be caused by necessity (Dul, Van der Laan, et al., 2020). The test protects the researcher from making a false positive conclusion, namely that necessity is supported when the empty space is likely a random result of two unrelated variables.

6.5 NCA and correlation


The correlation between two variables can me expressed as a number \(r\) , which is the correlation coefficient. This coefficient can have a value between -1 and +1. This section shows by simulations that a correlation between \(X\) and \(Y\) can be produced not only by a sufficiency relationship between \(X\) and \(Y\), but also by a necessity relationship. Correlation is not causation, and therefore an observed correlation cannot be automatically interpreted as in indication of sufficiency nor necessity.

6.5.1 Correlation by sufficiency

For empirical testing of a sufficiency relationship (e.g. as expressed in a hypothesis) the relationship is usually modeled as set of single factors (\(X\)’s) and possibly combinations of factors that add up to produce the outcome (additive logic). Such model captures a few factors of interest and assumes that all other factors together have on average no effect on the outcome. This sufficiency model corresponds to the well-known regression model with additive terms (with the factors of interest) and an error term (\(\epsilon\)) representing the other factors. Because the error term is assumed to have an average value of zero, the regression model with the factors of interest describes the average effect of these factors on the outcome. Note that it is often assumed that \(\epsilon\) is normally distributed and thus can have any value between minus Infinity and plus Infinity. As a consequence also \(Y\) can have any value between -Inf and + Inf. Moreover, the additive equation indicates that \(X\) is not a necessary cause of \(Y\) because a certain value of \(\epsilon\) can compensate for it.

A simple linear (average) sufficiency model is \(a + bX = Y\), in which \(a\) is the intercept and \(b\) is the slope that take values of 0 and 1 respectively. The average sufficiency effect of \(X\) on \(Y\) can be modeled by the regression equation \(Y = a + bX + \epsilon\). Figure 6.2 shows a \(XY\) scatter plot of 100 cases randomly from a population in which this linear relationship holds. \(X\) is a fixed variable between 0 and 1, and \(Y\) and \(\epsilon\) are normally distributed random variables with zero averages and standard deviations of 1. The simulation shows that the sufficiency relationship results in a correlation of 0.34.

Correlation with coefficient 0.34 resulting from an average sufficiency relationship. The line through the middle is the regression line representing the sufficiency relationship.

Figure 6.2: Correlation with coefficient 0.34 resulting from an average sufficiency relationship. The line through the middle is the regression line representing the sufficiency relationship.

The true (because induced) additive average sufficiency relationship in the population (in this case linear with parameters \(a = 0\) and \(b = 1\)) can be described with a regression line through the middle of the data (solid line). It would not be correct to draw a ceiling line on top of the data and interpret it as representing a necessity relationship between \(X\) and \(Y\) (dashed line).

6.5.2 Correlation by necessity

A necessity causal relationship between \(X\) and \(Y\) can also produce a correlation between \(X\) and \(Y\). For empirical testing of a necessity relationship (e.g. as expressed in a hypothesis) the relationship is modeled as a single factor that enables the outcome (necessity logic). Such model captures the single necessary factor independently of all other causal factors. This necessity model corresponds to the NCA ceiling line.

A simple linear necessity model (ceiling line) is \(a_c + b_cX = Y\), in which \(a_c\) is the intercept of the ceiling line and \(b_c\) is the slope of the ceiling line that take values of 0.4 and 1 respectively. This can be represented by the ceiling equation \(Y \leq a_c + b_cX\).

Figure 6.3 shows a \(XY\) scatter plot of 100 cases randomly from a population in which this linear ceiling relationship holds. \(X\) is a random variable between 0 and 1, and \(Y\) is a uniform distributed random variable bounded by the ceiling line. The simulation shows that the necessity relationship results in a correlation of 0.38.

Correlation with coefficient 0.38 resulting from a necessity relationship. The line on top of the data is the ceiling line representing the necessity relationship.

Figure 6.3: Correlation with coefficient 0.38 resulting from a necessity relationship. The line on top of the data is the ceiling line representing the necessity relationship.

The true (because induced) necessity relationship in the population (in this case linear ceiling with parameters \(a\) = 0.4 and \(b\) = 1) can be described with a ceiling line on top of the data (solid line). It would not be correct to draw a regression line through the middle of the data and interpret it as representing a sufficiency relationship between \(X\) and \(Y\) (dashed line).

6.5.3 Interpretation of correlation when causality is unknown

When the underlying causality is unknown the correlation coefficient cannot distinguish between the best way to describe it: by a regression line or by a ceiling line. A regression line can be added to the scatter plot when the underlying causality is assumed to be additive, average sufficiency logic, and the ceiling line can be added when the underlying causality is assumed to be necessity logic (or both).

The often implicit assumption that the correlation coefficient is caused by an underlying additive causal model may be wrong and may have two reasons. First, additive logic is the main paradigm of causality. Second, the correlation coefficient \(r\) and the regression coefficient \(b\) are closely related and can be expressed in a mathematical equation: \(r = b * sd(X)/sd(Y)\) (\(sd\) is the standard deviation). A similar mathematical equation for the relationship between necessity effect size and correlation coefficient can be derived as well, but is very complex and not intuitive. (please contact the author for this formulae with the assumption of a uniform distribution of the data below the ceiling line).

6.6 NCA’s Data Generation Process

under construction

6.7 The absence of assumptions about the distribution of the data

under construction

6.8 How to perform simulations with NCA

References

Dul, J. (2020). Conducting necessary condition analysis. Sage. https://uk.sagepub.com/en-gb/eur/conducting-necessary-condition-analysis-for-business-and-management-students/book262898
Dul, J., Van der Laan, E., & Kuik, R. (2020). A statistical significance test for necessary condition analysis. Organizational Research Methods, 23(2), 385–395. https://journals.sagepub.com/doi/full/10.1177/1094428118795272
Hoeffding, W. (1952). The large-sample power of tests based on permutations of observations. The Annals of Mathematical Statistics, 169–192. https://projecteuclid.org/journals/annals-of-mathematical-statistics/volume-23/issue-2/The-Large-Sample-Power-of-Tests-Based-on-Permutations-of/10.1214/aoms/1177729436.full
Kennedy, F. E. (1995). Randomization tests in econometrics. Journal of Business & Economic Statistics, 13(1), 85–94. https://www.jstor.org/stable/1392523?seq=1#metadata_info_tab_contents
Lehmann, E. L., Romano, J. P., & Casella, G. (2005). Testing statistical hypotheses (Vol. 3). Springer. https://www.springer.com/gp/book/9780387988641
Massu, J., Kuik, R., & Dul, J. (2020). Simulations with NCA. Rotterdam School of Management, Erasmus University.
Simar, L., & Wilson, P. W. (2008). Statistical inference in nonparametric frontier models: Recent developments and perspectives. In H. O. Fried, C. A. Knox Lovell, & S. S. Schmidt (Eds.), The measurement of productive efficiency and productivity growth. Oxford University Press.
Szucs, D., & Ioannidis, J. P. (2017). Empirical assessment of published effect sizes and power in the recent cognitive neuroscience and psychology literature. PLoS Biology, 15(3), e2000797. https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.2000797