Chapter 7 Miscellaneous

7.1 NCA software

7.1.1 Main NCA software: R package

The main NCA software is a free package in R called NCA(Dul & Buijs, 2023).The first version of this software was released in 2015 just before the online publication of NCA’s core paper (Dul, 2016b) on 15 July, 2015. Since then the software was updated several times to correct bugs, integrate new NCA developments, and make improvements based on feedback from users. The following versions have been released so far:

NCA_1.0 (2015-07-02)

NCA_1.1 (2015-10-10)

NCA_2.0 (2016-05-18)

NCA_3.0 (2018-08-01)

NCA_3.0.1 (2018-08-21)

NCA_3.0.2 (2019-11-22)

NCA_3.0.3 (2020-06-11)

NCA_3.1.0 (2021-03-02)

NCA_3.1.1 (2021-05-03)

NCA_3.2.0 (2022-04-05)

NCA_3.2.1 (2022-09-15)

NCA_3.3.0 (2023-02-06)

NCA_3.3.1 (2023-02-10)

NCA_3.3.2 (2023-06-27)

NCA_3.3.3 (2023-09-05)

NCA_4.0.0 (2024-02-16)

NCA_4.0.1 (2024-02-23)

NCA_4.0.2 (2024-11-09)

When the first digit of the version number increases, major changes were implemented. Version 1 was the first limited version, version 2 was the first comprehensive version, version 3 includes the statistical test for NCA, and version 4 introduced two new functions nca_random and nca_power. The second digit refers to larger changes and the third digit minor changes.

The currently installed version of the NCA software and the availability of a new version can be checked as follows:

packageVersion("NCA") # installed version
available.packages()["NCA",2] # available version

The cumulative number of downloads of the NCA software from the CRAN website are shown in Figure 7.1.

Cumulative number of downloads from the CRAN website of the NCA software since its launch).

Figure 7.1: Cumulative number of downloads from the CRAN website of the NCA software since its launch).

Until 31 December 2023 the software was downloaded more than 47,000 times.

The current cumulative number of downloads can be obtained by:

library("cranlogs")
x <- cran_downloads(packages = "NCA", from = "2015-07-02", to = Sys.Date())
cat("total downloads on")
## total downloads on
Sys.Date()
## [1] "2025-02-04"
tail(cumsum(x[,2]),n=1)
## [1] 58089

The Quick Start Guide (also here) helps novice users of R and NCA to get started, but the guide also includes more advanced features. The NCA package has a build-in help function with examples. Within R, the help function can be called as follows:

help(package = "NCA")

When help is needed for a specific function in NCA, for example nca_analysis the help for this function can be called as follows:

help("nca_analysis")

7.1.2 Other NCA software packages

Based on the R package a Stata module is available for conducting NCA (Spinelli et al., 2023). NCA is also part of the SmartPLS software for conducting NCA in combination with PLS-SEM (SmartPLS Development Team, 2023). These versions enable conducting a basic NCA in other software environments than R, but lack more advanced functions and recent developments of NCA. Currently, no NCA software is available for other platforms like SPSS or SAS.

7.2 Learning about NCA

Several possibilities exist to learn about NCA. Apart from reading books and publications about the NCA methodology (Table 0.2) and reading publications with examples of NCA applications (Table 0.1) several on site and online courses are available at the introduction and advanced levels. Major international conferences often host a professional development workshop on site to learn about NCA. An example is annual conference of the the Academy of Management. Also online short introduction courses exist for NCA. An example is a free on-demand introduction seminar hosted by Instats. A longer free introduction MOOC course is available from Coursera. At the advanced level, an on-demand course is hosted by Instats. Also an NCA paper development workshop is regularly organized. More information about NCA training and events can be found in the NCA website.

7.3 Producing high-quality NCA scatter plots

One of the primary results of NCA is the ‘NCA plot’, which is a scatter plot with ceiling line(s). Users often wish to save a high quality NCA plot for further use, for example for a publication. There are several ways to make and save a high quality NCA plot. In this section six ways are presented that have different levels of user flexibility:

  1. Standard plot.

  2. Standard plot with basic adaptations.

  3. Standard plot with advanced adaptations.

  4. Advanced plot with plotly.

  5. Advanced plot with ggplot.

  6. From within R script.

7.3.1 Standard plot

When NCA for R is used with RStudio, the standard NCA plot is displayed in the Plots window of RStudio. The plot can be produced with the argument plots = TRUE of the nca_output function. For the example dataset in NCA package, the standard plot can be obtained as follows:

library (NCA)
data(nca.example)
model <- nca_analysis(nca.example, 1, 3)
nca_output(model, plots = TRUE, summaries = FALSE)
Standard scatter plot of nca.example.

Figure 7.2: Standard scatter plot of nca.example.

7.3.2 Standard plot with basic adaptations

It may be desirable to change the appearance of the plot, such as the color of the points and the lines, the type of lines and the type of points. These characteristics can be changed with the graphical functions that are included in the NCA software. Before running nca_output, the NCA plot can be customized by changing the point type (for all points), the line types and line colors (for each ceiling line separately) and the line width (for all ceiling lines). For instance, a publisher may require a black and white or grayscale plot, with thicker lines and with solid points. Then the standard plot can be adapted as follows (changing of plot dimensions and the saving the plot is the same as for the standard plot):

line.colors['ce_fdh'] <- 'black'
line.colors['cr_fdh'] <- 'black'
line.colors['ols'] <- 'grey'
point.color <- "black"
point.type <- 20
line.width <- 2
nca_output(model, plots = TRUE, summaries = FALSE)
Standard scatter plot of nca.example in black/grey/white.

Figure 7.3: Standard scatter plot of nca.example in black/grey/white.

7.3.3 Standard plot with advanced adaptations

If further changes in the plot are needed, for example changes of the individual line types or the plot title, a special R script must be downloaded from: https://raw.githubusercontent.com/eur-rsm/nca_scripts/master/display_plot.R This script is used as an alternative for making the NCA plot with plots = TRUE in the nca_output function. The script can be copy/paste or otherwise loaded in your R session.

# Default values copied from p_constants
line_colors <- list(ols="green",        lh="red3",            cols="darkgreen",
                    qr="blue",          ce_vrs="orchid4",     cr_vrs="violet",
                    ce_fdh="red",       cr_fdh="orange",      sfa="darkgoldenrod",
                    c_lp="blue")
line_types  <- list(ols=3,              lh=2,                 cols=3,
                    qr=4,               ce_vrs=5,             cr_vrs=1,
                    ce_fdh=6,           cr_fdh=1,             sfa=7,
                    c_lp=2)
line_width  <- 1.5
point_type  <- 23
point_color <- 'red'
display_plot <-
  function (plot) {
    flip.x <- plot$flip.x
    flip.y <- plot$flip.y
    # Determine the bounds of the plot based on the scope
    xlim <- c(plot$scope.theo[1 + flip.x], plot$scope.theo[2 - flip.x])
    ylim <- c(plot$scope.theo[3 + flip.y], plot$scope.theo[4 - flip.y])
    # Reset/append colors etc. if needed
    for (method in names(line_types)) {
      line.types[[method]] <- line_types[[method]]
    }
    if (is.numeric(line_width)) {
      line.width <- line_width
    }
    if (is.numeric(point_type)) {
      point.type <- point_type
    }
    if (point_color %in% colors()) {
      point.color <- point_color
    }
    # Only needed until the next release (3.0.2)
    if (!exists("point.color")) {
      point.color <- "blue"
    }
    # Plot the data points
    plot (plot$x, plot$y, pch=point.type, col=point.color, bg=point.color,
          xlim=xlim, ylim=ylim, xlab=colnames(plot$x), ylab=tail(plot$names, n=1))
    # Plot the scope outline
    abline(v=plot$scope.theo[1], lty=2, col="grey")
    abline(v=plot$scope.theo[2], lty=2, col="grey")
    abline(h=plot$scope.theo[3], lty=2, col="grey")
    abline(h=plot$scope.theo[4], lty=2, col="grey")
    # Plot the legend before adding the clipping area
    legendParams = list()
    for (method in plot$methods) {
      line.color <- line.colors[[method]]
      line.type  <- line.types[[method]]
      name <- gsub("_", "-", toupper(method))
      legendParams$names  = append(legendParams$names, name)
      legendParams$types  = append(legendParams$types, line.type)
      legendParams$colors = append(legendParams$colors, line.color)
    }
    if (length(legendParams) > 0) {
      legend("topleft", cex=0.7, legendParams$names,
             lty=legendParams$types, col=legendParams$colors, bg=NA)
    }
    # Apply clipping to the lines
    clip(xlim[1], xlim[2], ylim[1], ylim[2])
    # Plot the lines
    for (method in plot$methods) {
      line <- plot$lines[[method]]
      line.color <- line.colors[[method]]
      line.type  <- line.types[[method]]
      if (method %in% c("lh", "ce_vrs", "ce_fdh")) {
        lines(line[[1]], line[[2]], type="l",
              lty=line.type, col=line.color, lwd=line.width)
      } else {
        abline(line, lty=line.type, col=line.color, lwd=line.width)
      }
    }
    # Plot the title
    title(paste0("Advanced Standard NCA Plot : ", plot$title), cex.main=1)
  }

The different default values of the plot can be changed as desired. For example, a researcher may wish to change the ols line into a dotted line (line.type to value 3), and to have a point type that is squared (point.type = 23) with red color of the points (point.color = 'red'). Also, the title of the plot can be changed at the end of the script into for example “Advanced NCA plot …”.
For producing the plot the final call is :

display_plot(model$plots[[1]])
Standard scatter plot of nca.example with advance adaptations.

Figure 7.4: Standard scatter plot of nca.example with advance adaptations.

The plot can be reset to default values by:

line.colors <- NULL
line.types <- NULL
line.width <- NULL
point.color <- "blue"
point.type <- 21

7.3.4 Plot with plotly

The NCA software can also produce a plot made by plotly. The main goal is to provide the user a possibility for graphical exploration of NCA results. However, it is also possible to save the plot. The plot with plotly can be produced with the nca_output function as follows:.

nca_output(model, plotly=TRUE, summaries = FALSE)

It is possible to label subgroups of points. For example, the continents of the countries in the nca.example dataset can be identified as follows (in the order of the rows in the dataset):

labels <- c("Australia", "Europe", "Europe", "North America", "Europe", "Europe", "Europe", "Europe", "Europe", "Europe", "Europe", "Europe", "Europe", "Asia", "North America", "Europe", "Australia", "Europe", "Europe", "Europe", "Europe", "Asia", "Europe", "Europe", "Europe", "Europe", "Europe", "North America") 
nca_output(model, plotly=labels, summaries = FALSE)

The plot opens in the Viewer tab of the plots window (Figure 7.5). The interactive version can be approached here. The plot identifies the ‘peers’ in red. Peers are points near the ceiling line that are used for drawing the default and other ceiling lines. When moving in the computer screen the pointer over the top of the plot, a toolbar pops up. One of its functions is “Download plot as png”, which saves the plot in the download folder of the computer (not in the working directory). Saving is also possible with Export tab -> Save as Image.

Plotly scatter plot of nca.example. The interactive version can be approached [here](https://www.erim.eur.nl/fileadmin/user_upload/plotly.html).

Figure 7.5: Plotly scatter plot of nca.example. The interactive version can be approached here.

7.3.5 Plot with ggplot

The function ggplot is an advanced graphical function in R. Output of NCA can be used as input for ggplot. For example, when the ceiling line is a straight line (such as the default ceiling line cr_fdh) the intercept and slope of the ceiling line can be used in ggplot to plot the ceiling line. First, these NCA parameters must be extracted from the NCA results:

intercept <- model$plots$Individualism$lines$cr_fdh$coefficients[1]
slope <- model$plots$Individualism$lines$cr_fdh$coefficients[2]

Next the ggplot can be produced:

library (ggplot2)
ggplot(nca.example, aes(Individualism,`Innovation performance`))+
  geom_point() +
  geom_abline(intercept=intercept, slope=slope, col = "red")
ggplot scatter plot of nca.example.

Figure 7.6: ggplot scatter plot of nca.example.

The ggplot function gives many graphical opportunities (see its documentation) to make the plot as desired.
Changing of plot dimensions and the saving the plot is the same as for the standard plot.

7.3.6 Saving the plot from within the R script

All above options for saving an NCA plot are based on first displaying the plot in R and then saving it. It is also possible to save the plot from within the R script. The following script saves a standard NCA plot as a png file with the name “nca.example.png”) in the working directory with a given size and resolution:

png("nca_example.png",units="cm", 15,15, res=300) 
nca_output(model, plots = TRUE, summaries = FALSE)
dev.off() 

It is also possible to save a pdf file:

pdf("nca_example.pdf",5,5) 
nca_output(model, plots = TRUE, summaries = FALSE)
dev.off() 

7.4 Additional functions for use with the NCA softare in R (beta)

7.4.1 nca_robustness_table_general.R

# Function to produce the general robustness table
nca_robustness_table_general <- function(data,
                                      conditions,
                                      outcome, 
                                      ceiling = "ce_fdh",
                                      scope = NULL, 
                                      d_threshold = 0.1,
                                      p_threshold = 0.05,
                                      outliers = 0,
                                      bottleneck.y = "percentile",
                                      target_outcome = 75,
                                      plots = FALSE,
                                      checks = checks
) {
  
  # Define the configurations for each robustness check
  
  
  # Initialize an empty list to store results
  results_list <- list()
  
  # Perform each robustness check
  for (check in checks) {
    result <- nca_robustness_checks_general(
      data = data,
      conditions = conditions,
      outcome = outcome,
      ceiling = check$ceiling,
      scope = check$scope,
      d_threshold = check$d_threshold,
      p_threshold = check$p_threshold,
      outliers = check$outliers,
      bottleneck.y = bottleneck.y,
      target_outcome = check$target_outcome,
      plots = plots
    )
    result$`Robustness check` <- check$name
    results_list[[check$name]] <- result
  }
  
  # Combine all results into one data frame
  results_all <- do.call(rbind, results_list)
  
  # Reorder columns
  results_all <- results_all[c("Condition", "Robustness check", "Effect_size", "p_value", "Necessity", "Bottlenecks_percent", "Bottlenecks_count")]
  row.names(results_all) = NULL
  
  # Reorganize the dataframe into new blocks (one block per condition)
  n <- length(conditions) # Block size
  rows_per_block <- seq(1, n) # Sequence of row indices in one block
  
  # Create a new column to assign new block numbers
  results_all$NewBlock <- rep(rows_per_block, length.out = nrow(results_all))
  
  # Split the dataframe into blocks and combine the blocks sequentially
  reorganized_df <- do.call(rbind, lapply(rows_per_block, function(i) {
    results_all[results_all$NewBlock == i, ] # Subset rows for the current block
  }))
  
  # Remove the helper column and reset row names
  reorganized_df <- reorganized_df[, -ncol(reorganized_df)]
  row.names(reorganized_df) <- NULL
  robustness_table_general <- reorganized_df
  return(robustness_table_general)
}

7.4.2 nca_robustness_checks_general.R

# Function to conduct general robustness checks
nca_robustness_checks_general <- function(data, 
                           conditions, 
                           outcome, 
                           ceiling = "ce_fdh", 
                           d_threshold = 0.1, 
                           p_threshold = 0.05, 
                           scope = NULL,
                           outliers = 0, 
                           bottleneck.y = "percentile", 
                           target_outcome = 75, 
                           plots = FALSE) {
  
  # Remove outliers if needed
  if (outliers != 0) {
    source("get_outliers.R")
    out <- get_outliers(data, conditions, outcome, ceiling, outliers)
    all_outliers <- unname(unlist(out$outliers_df)[!is.na(out$outliers_df)])
    data <- data[-all_outliers, ]
  }
  
  # Run the NCA analysis
  model <- nca_analysis(
    data,
    conditions, 
    outcome, 
    ceilings = ceiling, 
    bottleneck.x = 'percentile',
    bottleneck.y = bottleneck.y, 
    steps = c(target_outcome, NA),
    test.rep = 1000,
    scope = scope
  )
  
  # View scatter plots
  if (plots) {
  nca_output(model, summaries = FALSE) #add pdf=plots to save pdf scatter plots
  }
  
  # Extract effect size and p value
  effect_size <- sapply(conditions, function(e) model$summaries[[e]][[2]][[2]])
  p_value <- sapply(conditions, function(p) model$summaries[[p]][[2]][[6]])
  necessity <- ifelse(effect_size >= d_threshold & p_value <= p_threshold, "yes", "no")
  
  # Extract the bottlenecks data frame from the model
  b <- as.data.frame(model$bottlenecks[[1]])
  row_index <- which(b[[1]] == target_outcome)
  row_values <- b[row_index, -1]
  Bottlenecks1 <- as.numeric(row_values) # percentage bottleneck cases 
  Bottlenecks2 <- round(Bottlenecks1 * nrow(data) / 100) # number bottleneck cases
  
  # Dataframe results
  results_all <- data.frame(
    Condition = conditions,
    Effect_size = effect_size,
    p_value = p_value,
    Necessity = necessity,
    Bottlenecks_percent = Bottlenecks1,
    Bottlenecks_count = Bottlenecks2
  )
  
  return(results_all)
}

7.4.3 get_outliers.R

# Function to get outliers

get_outliers <- function (data, conditions, outcome, ceiling, k, row.numbers = TRUE) {
  if (row.numbers) {
    rownames(data) <- NULL
  }

  # Apply nca_outliers for each condition in the conditions vector
  outliers_list <- lapply(conditions, function (condition) {
    nca_outliers(data, x = condition, y = outcome, ceiling = ceiling, k = k)
  })

  # Extract the k data points for the first outlier, add NA's if needed
  k_outliers <- lapply(outliers_list, function (o) {
    parts <- unlist(strsplit(o[[1, 1]], ' - ', fixed = T))
    if (row.numbers) {
      parts <- as.numeric(parts)
    }
    c(parts, rep(NA, k - length(parts)))
  })

  outliers_df <- do.call(rbind, k_outliers)
  colnames(outliers_df) <- paste("Outlier", 1:k)
  rownames(outliers_df) <- conditions

  return(list(outliers_list = outliers_list, outliers_df = outliers_df))
}

7.4.4 get_indicator_data.R

# Function to get the data for the TAM example https://data.mendeley.com/datasets/pd5dp3phx2/4

df0 <- read.csv("Extended TAM.csv")
# Preview the data
head(df0, 3)

# Make a new dataset without last four variables
df1 <- df0[, 1:(ncol(df0) - 4)]

# Rename the indicators for use in SEMinR  
colnames(df1) <- c("PU1","PU2","PU3",
                   "CO1","CO2","CO3",
                   "EOU1", "EOU2","EOU3",
                   "EMV1", "EMV2","EMV3",
                   "AD1",  "AD2","AD3",
                   "USE")

# Preview the new dataset
head(df1,3)

7.4.5 estimate_sem_model.R

# Function to estimate SEM model for the TAM example

# Load library for conducting SEM
library(seminr)

# Specify the measurement model
TAM_mm <- constructs(
  composite("Perceived usefulness", multi_items("PU", 1:3)),
  composite("Compatibility", multi_items("CO", 1:3)),
  composite("Perceived ease of use", multi_items("EOU", 1:3)),
  composite("Emotional value", multi_items("EMV", 1:3)),
  composite("Adoption intention", multi_items("AD", 1:3)),
  composite("Technology use", single_item("USE"))
)

# Specify the structural model
TAM_sm <- relationships(
  paths(from = "Perceived usefulness", to = c("Adoption intention", "Technology use")),
  paths(from = "Compatibility",         to = c("Adoption intention", "Technology use")),
  paths(from = "Perceived ease of use", to = c("Adoption intention", "Technology use")),
  paths(from = "Emotional value",       to = c("Adoption intention", "Technology use")),
  paths(from = "Adoption intention",    to = c("Technology use"))
)

# Estimate the pls model (using indicator data)

# Conduct pls estimation with raw indicator scores
TAM_pls <- estimate_pls(data = df1,
                        measurement_model = TAM_mm,
                        structural_model = TAM_sm)

# Print path coefficients and model fit measures
summary(TAM_pls)

7.4.6 unstandardize.R

# Function to obtain a dataset with the unstandardized latent variables using the output of the SEMinR pls model estimation as function argument.

unstandardize<- function (sem) {
  
  # Extract the original indicators
  original_indicators_df <- as.data.frame(sem$rawdata)
  
  # Extract the standardized weights and store results in a dataframe 
  standardized_weights_matrix <- sem$outer_weights
  standardized_weights_df <- as.data.frame(as.table(standardized_weights_matrix))
  colnames(standardized_weights_df) <- c("Indicator", "Latent variable", "Standardized weight")
  standardized_weights_df <- standardized_weights_df[standardized_weights_df$'Standardized weight' != 0, ]
  rownames(standardized_weights_df) <- NULL
  indicator_df <- standardized_weights_df
  
  # Extract the standard deviation of the original indicators and compute the unstandardized weights
  indicator_df$'Unstandardized weight' <- indicator_df$'Standardized weight'/ sem$sdData
  
  # Compute the 0-1 normalized unstandardized weights
  sum_weights <- tapply(indicator_df$"Unstandardized weight", indicator_df$`Latent variable`, sum)
  indicator_df$"Sum unstandardized weight" <- sum_weights[match(indicator_df$`Latent variable`, names(sum_weights))]
  indicator_df$"Normalized unstandardized weight" <- indicator_df$"Unstandardized weight" /indicator_df$"Sum unstandardized weight"
  
  # Compute the unstandardized latent variables by matrix multiplication
  unstandardized_latent_variable_df <- data.frame(matrix(nrow = nrow(original_indicators_df), ncol = 0))
  for (latent_variable in unique(indicator_df$`Latent variable`)) {
    latent_data <- indicator_df[indicator_df$`Latent variable` == latent_variable, ]
    indicators <- latent_data$Indicator
    normalized_weights <- latent_data$`Normalized unstandardized weight`
    original_scores <- original_indicators_df[, indicators, drop = FALSE] 
    latent_score <- rowSums(sweep(original_scores, 2, normalized_weights, `*`))
    unstandardized_latent_variable_df <- cbind(unstandardized_latent_variable_df, latent_score)
    colnames(unstandardized_latent_variable_df)[ncol(unstandardized_latent_variable_df)] <- latent_variable
  }
  return (unstandardized_latent_variable_df)
}

7.4.7 normalize.R

# Function to min-max normalize data (scale 0-1 or 0-100)

min_max_normalize <- function(data, theoretical_min, theoretical_max, scale) {
  
  # New scale minimum and maximum
  new_min <- scale[1]
  new_max <- scale[2]
  
  # Apply normalization column by column
  normalized_data <- data
  for (i in seq_along (data)) {
    if (is.numeric(data[[i]])) {
      min_val <- theoretical_min[i]
      max_val <- theoretical_max[i]
      normalized_data[[i]] <- ((data[[i]] - min_val) / (max_val - min_val)) * (new_max - new_min) + new_min
    }
  }
  
  # Return the normalized dataframe
  return(normalized_data)
  
}

7.4.8 get_bottleneck_cases.R

# Function to extract the percentage of bottleneck cases for a given target_outcome from the condition percentile bottleneck table.

get_bottleneck_cases <- function(model, target_outcome) {
  
  # Extract the bottlenecks data frame from the model
  b <- as.data.frame(model$bottlenecks[[1]])
  row_index <- which(b[[1]] == target_outcome)
  row_values <- b[row_index, -1]
  row_values[] <- lapply(row_values, function(x) as.numeric(as.character(x)))
  
  # Create bottleneck dataframe
  df <- data.frame(
    `Condition` = colnames(row_values),
    `Bottlenecks` = as.numeric(row_values),
    check.names = FALSE # Prevent conversion of spaces to dots
  )
  colnames(df)[2] <- paste0("Bottlenecks (Y = ", target_outcome, ")")
  
  return(df)
}

7.4.9 get_ipma_df.R

# Function to get the IPMA dataframe from SEMinR

get_ipma_df <- function (data, sem, predictors, predicted) {
  
  # Importance = total effects 
  Importance_df <- as.data.frame(summary(sem)$total_effects)
  Importance <- Importance_df[predictors, predicted]

  # Performance = means of all latent variable scores
  Performance_df <- as.data.frame(sapply(data, mean))
  Performance <- Performance_df[predictors,1]

  # Combine into IPMA
  ipma_df <- data.frame(predictor=predictors, Importance, Performance)
  
  return(ipma_df)
}

7.4.10 get_ipma_plot.R

# Function to get IPMA plot

library(ggplot2)
library(ggrepel)

get_ipma_plot <- function(ipma_df, x_range = NULL) {
  if (is.null(x_range)) {
    x_range <- range(ipma_df$Importance, na.rm = TRUE)
  }  
  p <- ggplot(ipma_df, aes(x = Importance, y = Performance)) +
    geom_point(color = "black", size = 5) +  # Single point layer
    geom_text_repel(
      aes(label = predictor),
      size = 3.5,
      box.padding = 2 
    ) +
    coord_cartesian(ylim = c(0, 100)) +  # Ensure this matches your data range
    scale_x_continuous(limits = c(x_range[1], x_range[2])) +  # Adapt x-axis limits dynamically
    #scale_x_continuous(limits = c(0, 0.5)) +  # Explicitly set x-axis limits between 0 and 0.5
    scale_y_continuous(limits = c(0, 100)) +  # Explicitly set y-axis limits between 0 and 100
    labs(
      title = "classic IPMA",
      x = "Importance",
      y = "Performance"
    ) +
    theme_minimal()  # Use a clean theme
  print(p)
  
}

7.4.11 get_cipma_df.R

# Function to create a cIPMA dataset with Importance, Performance, and Bottlenecks (percentage of bottleneck cases)

get_cipma_df <- function(ipma_df, bottlenecks) { 
  colnames(ipma_df)[1] <- "predictor"
  ipma_df$predictor <- as.character(ipma_df$`predictor`)
  cipma_df <- cbind(ipma_df, bottlenecks[, -1, drop = FALSE])
  # Add a column that combines the predictor name with the number of single cases
  cipma_df$predictor_with_single_cases <- paste(cipma_df$predictor, 
                                                " (", 
                                                round(cipma_df[[ncol(cipma_df)]], 0), 
                                                ")", sep = "")
  
  
  return(cipma_df)
}

7.4.12 get_cipma_plot.R

# Function for producing the cIPMA plot

library(ggplot2)
library(ggrepel)

get_cipma_plot <- function(cipma_df, x_range = NULL, size_limits, size_range) {
  if (is.null(x_range)) {
    x_range <- range(cipma_df$Importance, na.rm = TRUE)
  }
  colname <- colnames(cipma_df)[4]
  target.Y <- as.numeric(sub(".*\\(Y = ([0-9.]+)\\).*", "\\1", colname))
  pc <- ggplot(cipma_df, aes(x = Importance, y = Performance, size = .data[[colname]])) +
    geom_point(color = "white") +
    geom_point(shape = 1, color = "black") +  # Add a black outline
    geom_text_repel(
      aes(label = predictor_with_single_cases),  
      size = 3.5,
      box.padding = 2  # Adjust the padding around labels
    ) +
    coord_cartesian(ylim = c(0, 100)) +  # Ensure this matches your data range
    scale_x_continuous(limits = c(x_range[1], x_range[2])) +  # Adapt x-axis limits dynamically
    scale_y_continuous(limits = c(0, 100)) +  # Explicitly set y-axis limits between 0 and 100
    scale_size_continuous(limits = size_limits, range = size_range) +
    labs(
      title = paste("cIPMA for target outcome Y =" , target.Y), 
      x = "Importance",
      y = "Performance"
    ) +
    theme_minimal() +  # Use a clean theme
    guides(size = "none")  # Explicitly remove size legend
  print(pc)
  
}

7.4.13 get_single_bottleneck_cases.R

# Function to find the number of single bottleneck cases per condition  

get_single_bottleneck_cases <- function(data, conditions, outcome, target_outcome, ceiling) {
  # Check if target_outcome is within the range of the outcome variable
  outcome_min <- min(data[[outcome]], na.rm = TRUE)
  outcome_max <- max(data[[outcome]], na.rm = TRUE)
  
  if (target_outcome < outcome_min || target_outcome > outcome_max) {
    stop(
      sprintf(
        "Error: target_outcome (%f) is out of range for the outcome variable '%s'. Minimum: %f, Maximum: %f",
        target_outcome, outcome, outcome_min, outcome_max
      )
    )
  }
  
  # Initialize matrices to store bottlenecks
  bottleneck_matrix <- matrix(0, nrow = nrow(data), ncol = length(conditions))
  colnames(bottleneck_matrix) <- conditions
  
  for (condition in conditions) {
    # Perform NCA for the given condition
    model <- nca_analysis(
      data, 
      condition, 
      outcome, 
      ceilings = ceiling, 
      bottleneck.y = "actual", 
      bottleneck.x = "actual", 
      steps = c(target_outcome, NA)
    )
    # Extract the x-critical (xc) threshold
    df_model <- model$bottlenecks[[1]]
    # Check if bottleneck x value is NN 
    if ((df_model[, 2][1,1]) == "NN" ) {xc = 0
    } else {
      xc <- as.numeric(df_model[, 2])
    } # Assuming the second column contains xc values
    # Compute bottlenecks for the current condition
    bottleneck_matrix[, condition] <- as.numeric(data[[condition]] < xc)
  }
  
  # Calculate the total bottlenecks for each row
  total_bottlenecks_per_row <- rowSums(bottleneck_matrix)
  # Count the number of rows where each case is the sole bottleneck
  single_bottlenecks_per_condition <- colSums(bottleneck_matrix & (total_bottlenecks_per_row == 1))
  
  # # Return the results
  # single_bottleneck_cases <- list(
  #   bottleneck_matrix = bottleneck_matrix,
  #   total_bottlenecks_per_row = total_bottlenecks_per_row,
  #   single_bottlenecks_per_condition = single_bottlenecks_per_condition  # number
  # )
  # 
  single_bottleneck_cases <- as.data.frame (single_bottlenecks_per_condition) # number
  colnames(single_bottleneck_cases)[1] <- paste0("Bottlenecks (Y = ", target_outcome, ")")
  single_bottleneck_cases <- 100*(single_bottleneck_cases/nrow(data)) # percentage
  
  return (single_bottleneck_cases)
  
}

7.4.14 get_bipma_df.R

# Function to create a BIPMA dataset with Importance, Performance, and Bottlenecks (percentage of SINGLE bottleneck cases)

get_bipma_df <- function(ipma_df, single_bottlenecks) { 
  colnames(ipma_df)[1] <- "predictor"
  ipma_df$predictor <- as.character(ipma_df$`predictor`)
  bipma_df <- cbind(ipma_df, single_bottlenecks) 
  row.names(bipma_df) <- NULL
  
  # Add a column that combines the predictor name with the number of single cases
  bipma_df$predictor_with_single_cases <- paste(bipma_df$predictor, 
                                                " (", 
                                                round(bipma_df[[ncol(bipma_df)]], 0), 
                                                ")", sep = "")
  
  
  return(bipma_df)
}

7.4.15 get_bipma_plot.R

# Function for producing the BIPMA plot

library(ggplot2)
library(ggrepel)

get_bipma_plot <- function(bipma_df, x_range = NULL, size_limits, size_range) {
  if (is.null(x_range)) {
    x_range <- range(bipma_df$Importance, na.rm = TRUE)
  }
  
  colname <- colnames(bipma_df)[4]
  target.Y <- as.numeric(sub(".*\\(Y = ([0-9.]+)\\).*", "\\1", colname))
  
  pc <- ggplot(bipma_df, aes(x = Importance, y = Performance, size = .data[[colname]])) +
    geom_point(color = "white") +
    geom_point(shape = 1, color = "black") +  # Add a black outline
    geom_text_repel(
      aes(label = predictor_with_single_cases),  
      size = 3.5,
      box.padding = 2  # Adjust the padding around labels
    ) +
    coord_cartesian(ylim = c(0, 100)) +  # Ensure this matches your data range
    scale_x_continuous(limits = c(x_range[1], x_range[2])) +  # Adapt x-axis limits dynamically
    scale_y_continuous(limits = c(0, 100)) +  # Explicitly set y-axis limits between 0 and 100
    scale_size_continuous(limits = size_limits, range = size_range) +
    labs(
      title = paste("BIPMA for target outcome Y =" , target.Y), 
      x = "Importance",
      y = "Performance"
    ) +
    theme_minimal() +  # Use a clean theme
    guides(size = "none")  # Explicitly remove size legend
  
  print(pc)
  
}

7.4.16 ncasem_example.R

# Function to conduct NCA-SEM 

#To be sourced:
# source("unstandardize.R")
# source("normalize.R")
# source("get_outliers.R")
# source("get_ipma_df.R")
# source("get_ipma_plot.R")
# source("get_bottleneck_cases.R") #extracted from bottleneck table
# source("get_cipma_df.R")
# source("get_cipma_plot.R")
# source("get_single_bottleneck_cases.R")
# source("get_bipma_df.R")
# source("get_bipma_plot.R")
# source("get_indicator_data.R")
# source("estimate_sem_model.R")

# Assuming Step 1 (hypotheses) and Step 2 (SEM) have been conducted
# Step 3-5
ncasem <- function(sem,
                   outliers = 0,
                   standardize = 0, # unstandardized
                   normalize = 1, # normalized
                   scale = c(0,100), # normalized 0-100 (percentage)
                   predictors,
                   predicted,
                   theoretical_min,
                   theoretical_max,
                   ceiling = "ce_fdh",
                   d_threshold = 0.10,
                   p_threshold = 0.05, 
                   scope = NULL, # empirical scope
                   target_outcome,
                   plots = FALSE) {
 
#Step 3 select latent variable scores
  data <- sem$construct_scores
  # Standardization step
  if (standardize == 0) {
    source("unstandardize.R")
    data <- unstandardize(sem)
  }
  
  # Normalization step
  if (normalize == 0) {
    data <- data  # No normalization
  } else {
    source("normalize.R")
    data <- min_max_normalize(data, theoretical_min, theoretical_max, scale)
  }
  
# Step 4: Conduct NCA
  library(NCA)
  data <- data
  ceilings = ceiling
  test.rep = 10000 #set the number of permutations for estimating the p value
  bottleneck.y = "actual"
  bottleneck.x = "percentile"
  steps=seq(0, 100, 5)
  scope=scope 
  
  # Process outliers if needed
  if (outliers != 0) {
    source("get_outliers.R")
    conditions = predictors
    outcome = predicted
    out <- get_outliers(data, conditions, outcome, ceiling, outliers)
    all_outliers <- unname(unlist(out$outliers_df)[!is.na(out$outliers_df)])
    # Remove outliers
    data <- data[-all_outliers, ]
  }
  
  # Run the NCA analysis
  model <- nca_analysis(
    data, 
    predictors, 
    predicted, 
    ceilings = ceiling, 
    bottleneck.x = bottleneck.x,
    bottleneck.y = bottleneck.y, 
    steps = steps,
    test.rep = test.rep,
    scope = scope
  )
  
  nca_output(model, summaries = FALSE, pdf = plots)
  
# Step 5: Produce BIPMA
  # IPMA
  source("get_ipma_df.R")
  IPMA_df <- get_ipma_df(data = data, sem = sem, predictors = predictors, predicted = predicted)
  ipma_df <- IPMA_df
  source("get_ipma_plot.R")
  IPMA_plot <- get_ipma_plot(ipma_df = ipma_df, x_range = x_range)
  if (plots) { 
  current_time <- format(Sys.time(), "%Y-%m-%d_%H-%M-%S")
  filename <- paste0("IPMA plot_", current_time, ".pdf")
  ggsave(filename, plot = IPMA_plot, width = 8, height = 6, dpi = 300)
  }
  
  # cIPMA
  source("get_bottleneck_cases.R") #extracted from bottleneck table
  bottlenecks <- get_bottleneck_cases(model, target_outcome)
  source("get_cipma_df.R")
  CIPMA_df <- get_cipma_df(ipma_df = ipma_df, bottlenecks = bottlenecks)
  source("get_cipma_plot.R")
  cipma_df <- CIPMA_df
  CIPMA_plot <- get_cipma_plot(cipma_df = cipma_df, x_range = x_range, size_limits = size_limits, size_range = size_range)
  if (plots) { 
    current_time <- format(Sys.time(), "%Y-%m-%d_%H-%M-%S")
    filename <- paste0("CIPMA plot_", current_time, ".pdf")
    ggsave(filename, plot = CIPMA_plot, width = 8, height = 6, dpi = 300)
  }
  
  # BIPMA
  source("get_single_bottleneck_cases.R")
  source("get_bipma_df.R")
  single_bottlenecks <- get_single_bottleneck_cases (data, conditions = predictors, outcome = predicted, target_outcome = target_outcome, ceiling = ceiling)
  BIPMA_df <- get_bipma_df(ipma_df = ipma_df, single_bottlenecks = single_bottlenecks)
  source("get_bipma_plot.R")
  bipma_df <- BIPMA_df
  BIPMA_plot <- get_bipma_plot(bipma_df = bipma_df, x_range = x_range, size_limits = size_limits, size_range = size_range)
  if (plots) { 
    current_time <- format(Sys.time(), "%Y-%m-%d_%H-%M-%S")
    filename <- paste0("BIPMA plot_", current_time, ".pdf")
    ggsave(filename, plot = BIPMA_plot, width = 8, height = 6, dpi = 300)
  }

  # Extract results
  effect_size <- sapply(predictors, function(p) model$summaries[[p]][[2]][[2]])
  p_value <- sapply(predictors, function(p) model$summaries[[p]][[2]][[6]])
  necessity <- ifelse(effect_size >= d_threshold & p_value <= p_threshold, "yes", "no")
  results_df <- data.frame(
    predictor = predictors,
    effect_size = effect_size,
    p_value = p_value,
    necessity = necessity,
    Bottlenecks1 = (single_bottlenecks), 
    Bottlenecks2 = (single_bottlenecks)*nrow(data)/100, 
    row.names = NULL
  )
  colnames(results_df) <- c("Predictor", "Effect size", "p value", "Necessity", "Single bottlenecks(%)", "Single bottlenecks(#)")
  results_df$Priority <- rank(-results_df[,6])
  
  
  result <- list(data = data, results =results_df)

  return(result)
}


### Example

# Step 2: Conduct PLS-SEM with SEMinR on TAM data
source("get_indicator_data.R")
source("estimate_sem_model.R")

# Step 3-5 using ncasem function

#Step 2
# SEM model
sem <- TAM_pls

# Step 3-5
standardize <- 0
normalize <- 1
predictors <- c("Perceived usefulness", "Compatibility", "Perceived ease of use", "Emotional value", "Adoption intention")
predicted <- "Technology use"
theoretical_min <- c(1,1,1,1,1,1)   # Min values original scale
theoretical_max <- c(5,5,5,5,5,7)   # Max values original scale

# NCA
ceiling = "ce_fdh"
target_outcome <- 85
d_threshold <- 0.10
p_threshold <- 0.05
outliers = 0
scope = NULL

# (x)IPMA plots
plots= FALSE
scale <- c(0,100)  # Define new scale
x_range <- c(0,0.6) # range of values of the Importance x-axis of the (x)IPMA plot
size_limits = c(0,100) 
size_range <- c(0.5,50) # the of the bubble of the (x)IPMA plot

original <- ncasem(sem,
                   outliers,
                   standardize,
                   normalize,
                   scale,
                   predictors,
                   predicted, 
                   theoretical_min,
                   theoretical_max,
                   ceiling,
                   d_threshold,
                   p_threshold,
                   scope, 
                   target_outcome,
                   plots
                   )

original$results <- cbind(`Robustness check` = "Original", original$results)
print(dim(original$results)) # Check dimensions
original$results

# Step 6
### Ceiling change
ceiling = "cr_fdh"
other_ceiling <- ncasem(sem,
                        outliers,
                        standardize,
                        normalize,
                        scale,
                        predictors,
                        predicted, 
                        theoretical_min,
                        theoretical_max,
                        ceiling,
                        d_threshold,
                        p_threshold,
                        scope, 
                        target_outcome,
                        plots
)
other_ceiling$results <- cbind(`Robustness check` = "Ceiling change", other_ceiling$results)
other_ceiling$results

### d threshold change
ceiling = "ce_fdh"
d_threshold <- 0.20
other_d <- ncasem(sem,
                  outliers,
                  standardize,
                  normalize,
                  scale,
                  predictors,
                  predicted, 
                  theoretical_min,
                  theoretical_max,
                  ceiling,
                  d_threshold,
                  p_threshold,
                  scope, 
                  target_outcome,
                  plots
)
other_d$results <- cbind(`Robustness check` = "d value change", other_d$results)
other_d$results

### p threshold change
d_threshold <- 0.10
p_threshold <- 0.01
other_p <- ncasem(sem,
                  outliers,
                  standardize,
                  normalize,
                  scale,
                  predictors,
                  predicted, 
                  theoretical_min,
                  theoretical_max,
                  ceiling,
                  d_threshold,
                  p_threshold,
                  scope, 
                  target_outcome,
                  plots
)
other_p$results <- cbind(`Robustness check` = "p value change", other_p$results)
other_p$results

### Scope change
p_threshold <- 0.05
scope <- list( c(1,5,1,5), c(1,5,1,5), c(1,5,1,5), c(1,5,1,5), c(1,5,1,5)) #theoretical scope
other_scope <- ncasem(sem,
                      outliers,
                      standardize,
                      normalize,
                      scale,
                      predictors,
                      predicted, 
                      theoretical_min,
                      theoretical_max,
                      ceiling,
                      d_threshold,
                      p_threshold,
                      scope, 
                      target_outcome,
                      plots
)
other_scope$results <- cbind(`Robustness check` = "Scope change", other_scope$results)
other_scope$results

### Single outlier removal
scope <- NULL
outliers = 1
other_outliers_single <- ncasem(sem,
                                outliers,
                                standardize,
                                normalize,
                                scale,
                                predictors,
                                predicted, 
                                theoretical_min,
                                theoretical_max,
                                ceiling,
                                d_threshold,
                                p_threshold,
                                scope, 
                                target_outcome,
                                plots
)
other_outliers_single$results <- cbind(`Robustness check` = "Single outlier removal", other_outliers_single$results)
other_outliers_single$results

### Multiple outlier removal
outliers = 2
other_outliers_multiple <- ncasem(sem,
                                  outliers,
                                  standardize,
                                  normalize,
                                  scale,
                                  predictors,
                                  predicted, 
                                  theoretical_min,
                                  theoretical_max,
                                  ceiling,
                                  d_threshold,
                                  p_threshold,
                                  scope, 
                                  target_outcome,
                                  plots
)
other_outliers_multiple$results <- cbind(`Robustness check` = "Multiple outlier removal",other_outliers_multiple$results)
other_outliers_multiple$results

### Target lower (80)
outliers <- 0
target_outcome <- 80
other_target_lower <- ncasem(sem,
                             outliers,
                             standardize,
                             normalize,
                             scale,
                             predictors,
                             predicted, 
                             theoretical_min,
                             theoretical_max,
                             ceiling,
                             d_threshold,
                             p_threshold,
                             scope, 
                             target_outcome,
                             plots
)
other_target_lower$results <- cbind(`Robustness check` = "Target lower", other_target_lower$results)
other_target_lower$results

### Target higher (90)
target_outcome <- 90
other_target_higher <- ncasem(sem,
                              outliers,
                              standardize,
                              normalize,
                              scale,
                              predictors,
                              predicted, 
                              theoretical_min,
                              theoretical_max,
                              ceiling,
                              d_threshold,
                              p_threshold,
                              scope, 
                              target_outcome,
                              plots
)
other_target_higher$results <- cbind(`Robustness check` = "Target higher", other_target_higher$results)
other_target_higher$results

results_all <- rbind(original$results,
                     other_ceiling$results,
                     other_d$results,
                     other_p$results,
                     other_scope$results, 
                     other_outliers_single$results,
                     other_outliers_multiple$results,
                     other_target_lower$results,
                     other_target_higher$results)

results_all <- results_all[c("Predictor",
                             "Robustness check",
                             "Effect size",
                             "p value",
                             "Necessity",
                             "Single bottlenecks(%)",
                             "Single bottlenecks(#)",
                             "Priority"
                             )]

# Reorganize the dataframe into new blocks
n <- length(predictors) # Block size
rows_per_block <- seq(1, n) # Sequence of row indices in one block

# Create a new column to assign new block numbers
results_all$NewBlock <- rep(rows_per_block, length.out = nrow(results_all))

# Split the dataframe into blocks and combine the blocks sequentially
reorganized_df <- do.call(rbind, lapply(rows_per_block, function(i) {
  results_all[results_all$NewBlock == i, ] # Subset rows for the current block
}))

# Remove the helper column and reset row names
reorganized_df <- reorganized_df[, -ncol(reorganized_df)]
row.names(reorganized_df) <- NULL

# View the reorganized dataframe
print(reorganized_df)
library(xlsx)

write.xlsx(reorganized_df, "robustness_table_sem.xlsx", row.names = FALSE)

References

Dul, J. (2016b). Necessary Condition Analysis (NCA) logic and methodology of “necessary but not sufficient” causality. Organizational Research Methods, 19(1), 10–52. https://doi.org/10.1177/1094428115584005
Dul, J., & Buijs, G. (2023). NCA (Version 3.3.3). The Comprehensive R Archive Network. https://cran.r-project.org/web/packages/NCA/index.html
SmartPLS Development Team. (2023). SmartPLS (Version 4). SmartPLS GmbH, Germany. https://www.smartpls.com/
Spinelli, D., Dul, J., & Buijs, G. (2023). nca: Stata module to perform necessary condition analysis (NCA) (Version 0.0.1). Statistical Software Components S459269, Boston College Department of Economics. https://ideas.repec.org/c/boc/bocode/s459269.html