Chapter 7 Miscellaneous
7.1 NCA software
7.1.1 Main NCA software: R package
The main NCA software is a free package in R called NCA
(Dul & Buijs, 2023).The first version of this software was released in 2015 just before the online publication of NCA’s core paper (Dul, 2016b) on 15 July, 2015. Since then the software was updated several times to correct bugs, integrate new NCA developments, and make improvements based on feedback from users.
The following versions have been released so far:
NCA_1.0 (2015-07-02)
NCA_1.1 (2015-10-10)
NCA_2.0 (2016-05-18)
NCA_3.0 (2018-08-01)
NCA_3.0.1 (2018-08-21)
NCA_3.0.2 (2019-11-22)
NCA_3.0.3 (2020-06-11)
NCA_3.1.0 (2021-03-02)
NCA_3.1.1 (2021-05-03)
NCA_3.2.0 (2022-04-05)
NCA_3.2.1 (2022-09-15)
NCA_3.3.0 (2023-02-06)
NCA_3.3.1 (2023-02-10)
NCA_3.3.2 (2023-06-27)
NCA_3.3.3 (2023-09-05)
NCA_4.0.0 (2024-02-16)
NCA_4.0.1 (2024-02-23)
NCA_4.0.2 (2024-11-09)
When the first digit of the version number increases, major changes were implemented. Version 1 was the first limited version, version 2 was the first comprehensive version, version 3 includes the statistical test for NCA, and version 4 introduced two new functions nca_random
and nca_power
. The second digit refers to larger changes and the third digit minor changes.
The currently installed version of the NCA software and the availability of a new version can be checked as follows:
The cumulative number of downloads of the NCA software from the CRAN website are shown in Figure 7.1.

Figure 7.1: Cumulative number of downloads from the CRAN website of the NCA software since its launch).
Until 31 December 2023 the software was downloaded more than 47,000 times.
The current cumulative number of downloads can be obtained by:
library("cranlogs")
x <- cran_downloads(packages = "NCA", from = "2015-07-02", to = Sys.Date())
cat("total downloads on")
## total downloads on
## [1] "2025-02-04"
## [1] 58089
The Quick Start Guide
(also here)
helps novice users of R and NCA to get started, but the guide also includes more advanced features. The NCA
package has a build-in help function with examples. Within R, the help function can be called as follows:
When help is needed for a specific function in NCA
, for example nca_analysis
the help for this function can be called as follows:
7.1.2 Other NCA software packages
Based on the R package a Stata module is available for conducting NCA (Spinelli et al., 2023). NCA is also part of the SmartPLS software for conducting NCA in combination with PLS-SEM (SmartPLS Development Team, 2023). These versions enable conducting a basic NCA in other software environments than R, but lack more advanced functions and recent developments of NCA. Currently, no NCA software is available for other platforms like SPSS or SAS.
7.2 Learning about NCA
Several possibilities exist to learn about NCA. Apart from reading books and publications about the NCA methodology (Table 0.2) and reading publications with examples of NCA applications (Table 0.1) several on site and online courses are available at the introduction and advanced levels. Major international conferences often host a professional development workshop on site to learn about NCA. An example is annual conference of the the Academy of Management. Also online short introduction courses exist for NCA. An example is a free on-demand introduction seminar hosted by Instats. A longer free introduction MOOC course is available from Coursera. At the advanced level, an on-demand course is hosted by Instats. Also an NCA paper development workshop is regularly organized. More information about NCA training and events can be found in the NCA website.
7.3 Producing high-quality NCA scatter plots
One of the primary results of NCA is the ‘NCA plot’, which is a scatter plot with ceiling line(s). Users often wish to save a high quality NCA plot for further use, for example for a publication. There are several ways to make and save a high quality NCA plot. In this section six ways are presented that have different levels of user flexibility:
Standard plot.
Standard plot with basic adaptations.
Standard plot with advanced adaptations.
Advanced plot with
plotly
.Advanced plot with
ggplot
.From within R script.
7.3.1 Standard plot
When NCA for R is used with RStudio, the standard NCA plot is displayed in the Plots window of RStudio. The plot can be produced with the argument plots = TRUE
of the nca_output
function. For the example dataset in NCA package, the standard plot can be obtained as follows:
library (NCA)
data(nca.example)
model <- nca_analysis(nca.example, 1, 3)
nca_output(model, plots = TRUE, summaries = FALSE)

Figure 7.2: Standard scatter plot of nca.example.
7.3.2 Standard plot with basic adaptations
It may be desirable to change the appearance of the plot, such as the color of the points and the lines, the type of lines and the type of points. These characteristics can be changed with the graphical functions that are included in the NCA software. Before running nca_output, the NCA plot can be customized by changing the point type (for all points), the line types and line colors (for each ceiling line separately) and the line width (for all ceiling lines). For instance, a publisher may require a black and white or grayscale plot, with thicker lines and with solid points. Then the standard plot can be adapted as follows (changing of plot dimensions and the saving the plot is the same as for the standard plot):
line.colors['ce_fdh'] <- 'black'
line.colors['cr_fdh'] <- 'black'
line.colors['ols'] <- 'grey'
point.color <- "black"
point.type <- 20
line.width <- 2
nca_output(model, plots = TRUE, summaries = FALSE)

Figure 7.3: Standard scatter plot of nca.example in black/grey/white.
7.3.3 Standard plot with advanced adaptations
If further changes in the plot are needed, for example changes of the individual line types or the plot title, a special R script must be downloaded from:
https://raw.githubusercontent.com/eur-rsm/nca_scripts/master/display_plot.R
This script is used as an alternative for making the NCA plot with plots = TRUE
in the nca_output
function. The script can be copy/paste or otherwise loaded in your R session.
# Default values copied from p_constants
line_colors <- list(ols="green", lh="red3", cols="darkgreen",
qr="blue", ce_vrs="orchid4", cr_vrs="violet",
ce_fdh="red", cr_fdh="orange", sfa="darkgoldenrod",
c_lp="blue")
line_types <- list(ols=3, lh=2, cols=3,
qr=4, ce_vrs=5, cr_vrs=1,
ce_fdh=6, cr_fdh=1, sfa=7,
c_lp=2)
line_width <- 1.5
point_type <- 23
point_color <- 'red'
display_plot <-
function (plot) {
flip.x <- plot$flip.x
flip.y <- plot$flip.y
# Determine the bounds of the plot based on the scope
xlim <- c(plot$scope.theo[1 + flip.x], plot$scope.theo[2 - flip.x])
ylim <- c(plot$scope.theo[3 + flip.y], plot$scope.theo[4 - flip.y])
# Reset/append colors etc. if needed
for (method in names(line_types)) {
line.types[[method]] <- line_types[[method]]
}
if (is.numeric(line_width)) {
line.width <- line_width
}
if (is.numeric(point_type)) {
point.type <- point_type
}
if (point_color %in% colors()) {
point.color <- point_color
}
# Only needed until the next release (3.0.2)
if (!exists("point.color")) {
point.color <- "blue"
}
# Plot the data points
plot (plot$x, plot$y, pch=point.type, col=point.color, bg=point.color,
xlim=xlim, ylim=ylim, xlab=colnames(plot$x), ylab=tail(plot$names, n=1))
# Plot the scope outline
abline(v=plot$scope.theo[1], lty=2, col="grey")
abline(v=plot$scope.theo[2], lty=2, col="grey")
abline(h=plot$scope.theo[3], lty=2, col="grey")
abline(h=plot$scope.theo[4], lty=2, col="grey")
# Plot the legend before adding the clipping area
legendParams = list()
for (method in plot$methods) {
line.color <- line.colors[[method]]
line.type <- line.types[[method]]
name <- gsub("_", "-", toupper(method))
legendParams$names = append(legendParams$names, name)
legendParams$types = append(legendParams$types, line.type)
legendParams$colors = append(legendParams$colors, line.color)
}
if (length(legendParams) > 0) {
legend("topleft", cex=0.7, legendParams$names,
lty=legendParams$types, col=legendParams$colors, bg=NA)
}
# Apply clipping to the lines
clip(xlim[1], xlim[2], ylim[1], ylim[2])
# Plot the lines
for (method in plot$methods) {
line <- plot$lines[[method]]
line.color <- line.colors[[method]]
line.type <- line.types[[method]]
if (method %in% c("lh", "ce_vrs", "ce_fdh")) {
lines(line[[1]], line[[2]], type="l",
lty=line.type, col=line.color, lwd=line.width)
} else {
abline(line, lty=line.type, col=line.color, lwd=line.width)
}
}
# Plot the title
title(paste0("Advanced Standard NCA Plot : ", plot$title), cex.main=1)
}
The different default values of the plot can be changed as desired. For example, a researcher may wish to change the ols
line into a dotted line (line.type
to value 3), and to have a point type that is squared (point.type = 23
) with red color of the points (point.color = 'red'
). Also, the title of the plot can be changed at the end of the script into for example “Advanced NCA plot …”.
For producing the plot the final call is :

Figure 7.4: Standard scatter plot of nca.example with advance adaptations.
The plot can be reset to default values by:
7.3.4 Plot with plotly
The NCA software can also produce a plot made by plotly
. The main goal is to provide the user a possibility for graphical exploration of NCA results. However, it is also possible to save the plot. The plot with plotly
can be produced with the nca_output
function as follows:.
It is possible to label subgroups of points. For example, the continents of the countries in the nca.example
dataset can be identified as follows (in the order of the rows in the dataset):
labels <- c("Australia", "Europe", "Europe", "North America", "Europe", "Europe", "Europe", "Europe", "Europe", "Europe", "Europe", "Europe", "Europe", "Asia", "North America", "Europe", "Australia", "Europe", "Europe", "Europe", "Europe", "Asia", "Europe", "Europe", "Europe", "Europe", "Europe", "North America")
nca_output(model, plotly=labels, summaries = FALSE)
The plot opens in the Viewer tab of the plots window (Figure 7.5). The interactive version can be approached here. The plot identifies the ‘peers’ in red. Peers are points near the ceiling line that are used for drawing the default and other ceiling lines. When moving in the computer screen the pointer over the top of the plot, a toolbar pops up. One of its functions is “Download plot as png”, which saves the plot in the download folder of the computer (not in the working directory). Saving is also possible with Export tab -> Save as Image.
.](Plotly.png)
Figure 7.5: Plotly scatter plot of nca.example. The interactive version can be approached here.
7.3.5 Plot with ggplot
The function ggplot
is an advanced graphical function in R. Output of NCA can be used as input for ggplot
. For example, when the ceiling line is a straight line (such as the default ceiling line cr_fdh
) the intercept and slope of the ceiling line can be used in ggplot
to plot the ceiling line.
First, these NCA parameters must be extracted from the NCA results:
intercept <- model$plots$Individualism$lines$cr_fdh$coefficients[1]
slope <- model$plots$Individualism$lines$cr_fdh$coefficients[2]
Next the ggplot
can be produced:
library (ggplot2)
ggplot(nca.example, aes(Individualism,`Innovation performance`))+
geom_point() +
geom_abline(intercept=intercept, slope=slope, col = "red")

Figure 7.6: ggplot scatter plot of nca.example.
The ggplot
function gives many graphical opportunities (see its documentation) to make the plot as desired.
Changing of plot dimensions and the saving the plot is the same as for the standard plot.
7.3.6 Saving the plot from within the R script
All above options for saving an NCA plot are based on first displaying the plot in R and then saving it. It is also possible to save the plot from within the R script. The following script saves a standard NCA plot as a png file with the name “nca.example.png”) in the working directory with a given size and resolution:
png("nca_example.png",units="cm", 15,15, res=300)
nca_output(model, plots = TRUE, summaries = FALSE)
dev.off()
It is also possible to save a pdf file:
7.4 Additional functions for use with the NCA softare in R (beta)
7.4.1 nca_robustness_table_general.R
# Function to produce the general robustness table
nca_robustness_table_general <- function(data,
conditions,
outcome,
ceiling = "ce_fdh",
scope = NULL,
d_threshold = 0.1,
p_threshold = 0.05,
outliers = 0,
bottleneck.y = "percentile",
target_outcome = 75,
plots = FALSE,
checks = checks
) {
# Define the configurations for each robustness check
# Initialize an empty list to store results
results_list <- list()
# Perform each robustness check
for (check in checks) {
result <- nca_robustness_checks_general(
data = data,
conditions = conditions,
outcome = outcome,
ceiling = check$ceiling,
scope = check$scope,
d_threshold = check$d_threshold,
p_threshold = check$p_threshold,
outliers = check$outliers,
bottleneck.y = bottleneck.y,
target_outcome = check$target_outcome,
plots = plots
)
result$`Robustness check` <- check$name
results_list[[check$name]] <- result
}
# Combine all results into one data frame
results_all <- do.call(rbind, results_list)
# Reorder columns
results_all <- results_all[c("Condition", "Robustness check", "Effect_size", "p_value", "Necessity", "Bottlenecks_percent", "Bottlenecks_count")]
row.names(results_all) = NULL
# Reorganize the dataframe into new blocks (one block per condition)
n <- length(conditions) # Block size
rows_per_block <- seq(1, n) # Sequence of row indices in one block
# Create a new column to assign new block numbers
results_all$NewBlock <- rep(rows_per_block, length.out = nrow(results_all))
# Split the dataframe into blocks and combine the blocks sequentially
reorganized_df <- do.call(rbind, lapply(rows_per_block, function(i) {
results_all[results_all$NewBlock == i, ] # Subset rows for the current block
}))
# Remove the helper column and reset row names
reorganized_df <- reorganized_df[, -ncol(reorganized_df)]
row.names(reorganized_df) <- NULL
robustness_table_general <- reorganized_df
return(robustness_table_general)
}
7.4.2 nca_robustness_checks_general.R
# Function to conduct general robustness checks
nca_robustness_checks_general <- function(data,
conditions,
outcome,
ceiling = "ce_fdh",
d_threshold = 0.1,
p_threshold = 0.05,
scope = NULL,
outliers = 0,
bottleneck.y = "percentile",
target_outcome = 75,
plots = FALSE) {
# Remove outliers if needed
if (outliers != 0) {
source("get_outliers.R")
out <- get_outliers(data, conditions, outcome, ceiling, outliers)
all_outliers <- unname(unlist(out$outliers_df)[!is.na(out$outliers_df)])
data <- data[-all_outliers, ]
}
# Run the NCA analysis
model <- nca_analysis(
data,
conditions,
outcome,
ceilings = ceiling,
bottleneck.x = 'percentile',
bottleneck.y = bottleneck.y,
steps = c(target_outcome, NA),
test.rep = 1000,
scope = scope
)
# View scatter plots
if (plots) {
nca_output(model, summaries = FALSE) #add pdf=plots to save pdf scatter plots
}
# Extract effect size and p value
effect_size <- sapply(conditions, function(e) model$summaries[[e]][[2]][[2]])
p_value <- sapply(conditions, function(p) model$summaries[[p]][[2]][[6]])
necessity <- ifelse(effect_size >= d_threshold & p_value <= p_threshold, "yes", "no")
# Extract the bottlenecks data frame from the model
b <- as.data.frame(model$bottlenecks[[1]])
row_index <- which(b[[1]] == target_outcome)
row_values <- b[row_index, -1]
Bottlenecks1 <- as.numeric(row_values) # percentage bottleneck cases
Bottlenecks2 <- round(Bottlenecks1 * nrow(data) / 100) # number bottleneck cases
# Dataframe results
results_all <- data.frame(
Condition = conditions,
Effect_size = effect_size,
p_value = p_value,
Necessity = necessity,
Bottlenecks_percent = Bottlenecks1,
Bottlenecks_count = Bottlenecks2
)
return(results_all)
}
7.4.3 get_outliers.R
# Function to get outliers
get_outliers <- function (data, conditions, outcome, ceiling, k, row.numbers = TRUE) {
if (row.numbers) {
rownames(data) <- NULL
}
# Apply nca_outliers for each condition in the conditions vector
outliers_list <- lapply(conditions, function (condition) {
nca_outliers(data, x = condition, y = outcome, ceiling = ceiling, k = k)
})
# Extract the k data points for the first outlier, add NA's if needed
k_outliers <- lapply(outliers_list, function (o) {
parts <- unlist(strsplit(o[[1, 1]], ' - ', fixed = T))
if (row.numbers) {
parts <- as.numeric(parts)
}
c(parts, rep(NA, k - length(parts)))
})
outliers_df <- do.call(rbind, k_outliers)
colnames(outliers_df) <- paste("Outlier", 1:k)
rownames(outliers_df) <- conditions
return(list(outliers_list = outliers_list, outliers_df = outliers_df))
}
7.4.4 get_indicator_data.R
# Function to get the data for the TAM example https://data.mendeley.com/datasets/pd5dp3phx2/4
df0 <- read.csv("Extended TAM.csv")
# Preview the data
head(df0, 3)
# Make a new dataset without last four variables
df1 <- df0[, 1:(ncol(df0) - 4)]
# Rename the indicators for use in SEMinR
colnames(df1) <- c("PU1","PU2","PU3",
"CO1","CO2","CO3",
"EOU1", "EOU2","EOU3",
"EMV1", "EMV2","EMV3",
"AD1", "AD2","AD3",
"USE")
# Preview the new dataset
head(df1,3)
7.4.5 estimate_sem_model.R
# Function to estimate SEM model for the TAM example
# Load library for conducting SEM
library(seminr)
# Specify the measurement model
TAM_mm <- constructs(
composite("Perceived usefulness", multi_items("PU", 1:3)),
composite("Compatibility", multi_items("CO", 1:3)),
composite("Perceived ease of use", multi_items("EOU", 1:3)),
composite("Emotional value", multi_items("EMV", 1:3)),
composite("Adoption intention", multi_items("AD", 1:3)),
composite("Technology use", single_item("USE"))
)
# Specify the structural model
TAM_sm <- relationships(
paths(from = "Perceived usefulness", to = c("Adoption intention", "Technology use")),
paths(from = "Compatibility", to = c("Adoption intention", "Technology use")),
paths(from = "Perceived ease of use", to = c("Adoption intention", "Technology use")),
paths(from = "Emotional value", to = c("Adoption intention", "Technology use")),
paths(from = "Adoption intention", to = c("Technology use"))
)
# Estimate the pls model (using indicator data)
# Conduct pls estimation with raw indicator scores
TAM_pls <- estimate_pls(data = df1,
measurement_model = TAM_mm,
structural_model = TAM_sm)
# Print path coefficients and model fit measures
summary(TAM_pls)
7.4.6 unstandardize.R
# Function to obtain a dataset with the unstandardized latent variables using the output of the SEMinR pls model estimation as function argument.
unstandardize<- function (sem) {
# Extract the original indicators
original_indicators_df <- as.data.frame(sem$rawdata)
# Extract the standardized weights and store results in a dataframe
standardized_weights_matrix <- sem$outer_weights
standardized_weights_df <- as.data.frame(as.table(standardized_weights_matrix))
colnames(standardized_weights_df) <- c("Indicator", "Latent variable", "Standardized weight")
standardized_weights_df <- standardized_weights_df[standardized_weights_df$'Standardized weight' != 0, ]
rownames(standardized_weights_df) <- NULL
indicator_df <- standardized_weights_df
# Extract the standard deviation of the original indicators and compute the unstandardized weights
indicator_df$'Unstandardized weight' <- indicator_df$'Standardized weight'/ sem$sdData
# Compute the 0-1 normalized unstandardized weights
sum_weights <- tapply(indicator_df$"Unstandardized weight", indicator_df$`Latent variable`, sum)
indicator_df$"Sum unstandardized weight" <- sum_weights[match(indicator_df$`Latent variable`, names(sum_weights))]
indicator_df$"Normalized unstandardized weight" <- indicator_df$"Unstandardized weight" /indicator_df$"Sum unstandardized weight"
# Compute the unstandardized latent variables by matrix multiplication
unstandardized_latent_variable_df <- data.frame(matrix(nrow = nrow(original_indicators_df), ncol = 0))
for (latent_variable in unique(indicator_df$`Latent variable`)) {
latent_data <- indicator_df[indicator_df$`Latent variable` == latent_variable, ]
indicators <- latent_data$Indicator
normalized_weights <- latent_data$`Normalized unstandardized weight`
original_scores <- original_indicators_df[, indicators, drop = FALSE]
latent_score <- rowSums(sweep(original_scores, 2, normalized_weights, `*`))
unstandardized_latent_variable_df <- cbind(unstandardized_latent_variable_df, latent_score)
colnames(unstandardized_latent_variable_df)[ncol(unstandardized_latent_variable_df)] <- latent_variable
}
return (unstandardized_latent_variable_df)
}
7.4.7 normalize.R
# Function to min-max normalize data (scale 0-1 or 0-100)
min_max_normalize <- function(data, theoretical_min, theoretical_max, scale) {
# New scale minimum and maximum
new_min <- scale[1]
new_max <- scale[2]
# Apply normalization column by column
normalized_data <- data
for (i in seq_along (data)) {
if (is.numeric(data[[i]])) {
min_val <- theoretical_min[i]
max_val <- theoretical_max[i]
normalized_data[[i]] <- ((data[[i]] - min_val) / (max_val - min_val)) * (new_max - new_min) + new_min
}
}
# Return the normalized dataframe
return(normalized_data)
}
7.4.8 get_bottleneck_cases.R
# Function to extract the percentage of bottleneck cases for a given target_outcome from the condition percentile bottleneck table.
get_bottleneck_cases <- function(model, target_outcome) {
# Extract the bottlenecks data frame from the model
b <- as.data.frame(model$bottlenecks[[1]])
row_index <- which(b[[1]] == target_outcome)
row_values <- b[row_index, -1]
row_values[] <- lapply(row_values, function(x) as.numeric(as.character(x)))
# Create bottleneck dataframe
df <- data.frame(
`Condition` = colnames(row_values),
`Bottlenecks` = as.numeric(row_values),
check.names = FALSE # Prevent conversion of spaces to dots
)
colnames(df)[2] <- paste0("Bottlenecks (Y = ", target_outcome, ")")
return(df)
}
7.4.9 get_ipma_df.R
# Function to get the IPMA dataframe from SEMinR
get_ipma_df <- function (data, sem, predictors, predicted) {
# Importance = total effects
Importance_df <- as.data.frame(summary(sem)$total_effects)
Importance <- Importance_df[predictors, predicted]
# Performance = means of all latent variable scores
Performance_df <- as.data.frame(sapply(data, mean))
Performance <- Performance_df[predictors,1]
# Combine into IPMA
ipma_df <- data.frame(predictor=predictors, Importance, Performance)
return(ipma_df)
}
7.4.10 get_ipma_plot.R
# Function to get IPMA plot
library(ggplot2)
library(ggrepel)
get_ipma_plot <- function(ipma_df, x_range = NULL) {
if (is.null(x_range)) {
x_range <- range(ipma_df$Importance, na.rm = TRUE)
}
p <- ggplot(ipma_df, aes(x = Importance, y = Performance)) +
geom_point(color = "black", size = 5) + # Single point layer
geom_text_repel(
aes(label = predictor),
size = 3.5,
box.padding = 2
) +
coord_cartesian(ylim = c(0, 100)) + # Ensure this matches your data range
scale_x_continuous(limits = c(x_range[1], x_range[2])) + # Adapt x-axis limits dynamically
#scale_x_continuous(limits = c(0, 0.5)) + # Explicitly set x-axis limits between 0 and 0.5
scale_y_continuous(limits = c(0, 100)) + # Explicitly set y-axis limits between 0 and 100
labs(
title = "classic IPMA",
x = "Importance",
y = "Performance"
) +
theme_minimal() # Use a clean theme
print(p)
}
7.4.11 get_cipma_df.R
# Function to create a cIPMA dataset with Importance, Performance, and Bottlenecks (percentage of bottleneck cases)
get_cipma_df <- function(ipma_df, bottlenecks) {
colnames(ipma_df)[1] <- "predictor"
ipma_df$predictor <- as.character(ipma_df$`predictor`)
cipma_df <- cbind(ipma_df, bottlenecks[, -1, drop = FALSE])
# Add a column that combines the predictor name with the number of single cases
cipma_df$predictor_with_single_cases <- paste(cipma_df$predictor,
" (",
round(cipma_df[[ncol(cipma_df)]], 0),
")", sep = "")
return(cipma_df)
}
7.4.12 get_cipma_plot.R
# Function for producing the cIPMA plot
library(ggplot2)
library(ggrepel)
get_cipma_plot <- function(cipma_df, x_range = NULL, size_limits, size_range) {
if (is.null(x_range)) {
x_range <- range(cipma_df$Importance, na.rm = TRUE)
}
colname <- colnames(cipma_df)[4]
target.Y <- as.numeric(sub(".*\\(Y = ([0-9.]+)\\).*", "\\1", colname))
pc <- ggplot(cipma_df, aes(x = Importance, y = Performance, size = .data[[colname]])) +
geom_point(color = "white") +
geom_point(shape = 1, color = "black") + # Add a black outline
geom_text_repel(
aes(label = predictor_with_single_cases),
size = 3.5,
box.padding = 2 # Adjust the padding around labels
) +
coord_cartesian(ylim = c(0, 100)) + # Ensure this matches your data range
scale_x_continuous(limits = c(x_range[1], x_range[2])) + # Adapt x-axis limits dynamically
scale_y_continuous(limits = c(0, 100)) + # Explicitly set y-axis limits between 0 and 100
scale_size_continuous(limits = size_limits, range = size_range) +
labs(
title = paste("cIPMA for target outcome Y =" , target.Y),
x = "Importance",
y = "Performance"
) +
theme_minimal() + # Use a clean theme
guides(size = "none") # Explicitly remove size legend
print(pc)
}
7.4.13 get_single_bottleneck_cases.R
# Function to find the number of single bottleneck cases per condition
get_single_bottleneck_cases <- function(data, conditions, outcome, target_outcome, ceiling) {
# Check if target_outcome is within the range of the outcome variable
outcome_min <- min(data[[outcome]], na.rm = TRUE)
outcome_max <- max(data[[outcome]], na.rm = TRUE)
if (target_outcome < outcome_min || target_outcome > outcome_max) {
stop(
sprintf(
"Error: target_outcome (%f) is out of range for the outcome variable '%s'. Minimum: %f, Maximum: %f",
target_outcome, outcome, outcome_min, outcome_max
)
)
}
# Initialize matrices to store bottlenecks
bottleneck_matrix <- matrix(0, nrow = nrow(data), ncol = length(conditions))
colnames(bottleneck_matrix) <- conditions
for (condition in conditions) {
# Perform NCA for the given condition
model <- nca_analysis(
data,
condition,
outcome,
ceilings = ceiling,
bottleneck.y = "actual",
bottleneck.x = "actual",
steps = c(target_outcome, NA)
)
# Extract the x-critical (xc) threshold
df_model <- model$bottlenecks[[1]]
# Check if bottleneck x value is NN
if ((df_model[, 2][1,1]) == "NN" ) {xc = 0
} else {
xc <- as.numeric(df_model[, 2])
} # Assuming the second column contains xc values
# Compute bottlenecks for the current condition
bottleneck_matrix[, condition] <- as.numeric(data[[condition]] < xc)
}
# Calculate the total bottlenecks for each row
total_bottlenecks_per_row <- rowSums(bottleneck_matrix)
# Count the number of rows where each case is the sole bottleneck
single_bottlenecks_per_condition <- colSums(bottleneck_matrix & (total_bottlenecks_per_row == 1))
# # Return the results
# single_bottleneck_cases <- list(
# bottleneck_matrix = bottleneck_matrix,
# total_bottlenecks_per_row = total_bottlenecks_per_row,
# single_bottlenecks_per_condition = single_bottlenecks_per_condition # number
# )
#
single_bottleneck_cases <- as.data.frame (single_bottlenecks_per_condition) # number
colnames(single_bottleneck_cases)[1] <- paste0("Bottlenecks (Y = ", target_outcome, ")")
single_bottleneck_cases <- 100*(single_bottleneck_cases/nrow(data)) # percentage
return (single_bottleneck_cases)
}
7.4.14 get_bipma_df.R
# Function to create a BIPMA dataset with Importance, Performance, and Bottlenecks (percentage of SINGLE bottleneck cases)
get_bipma_df <- function(ipma_df, single_bottlenecks) {
colnames(ipma_df)[1] <- "predictor"
ipma_df$predictor <- as.character(ipma_df$`predictor`)
bipma_df <- cbind(ipma_df, single_bottlenecks)
row.names(bipma_df) <- NULL
# Add a column that combines the predictor name with the number of single cases
bipma_df$predictor_with_single_cases <- paste(bipma_df$predictor,
" (",
round(bipma_df[[ncol(bipma_df)]], 0),
")", sep = "")
return(bipma_df)
}
7.4.15 get_bipma_plot.R
# Function for producing the BIPMA plot
library(ggplot2)
library(ggrepel)
get_bipma_plot <- function(bipma_df, x_range = NULL, size_limits, size_range) {
if (is.null(x_range)) {
x_range <- range(bipma_df$Importance, na.rm = TRUE)
}
colname <- colnames(bipma_df)[4]
target.Y <- as.numeric(sub(".*\\(Y = ([0-9.]+)\\).*", "\\1", colname))
pc <- ggplot(bipma_df, aes(x = Importance, y = Performance, size = .data[[colname]])) +
geom_point(color = "white") +
geom_point(shape = 1, color = "black") + # Add a black outline
geom_text_repel(
aes(label = predictor_with_single_cases),
size = 3.5,
box.padding = 2 # Adjust the padding around labels
) +
coord_cartesian(ylim = c(0, 100)) + # Ensure this matches your data range
scale_x_continuous(limits = c(x_range[1], x_range[2])) + # Adapt x-axis limits dynamically
scale_y_continuous(limits = c(0, 100)) + # Explicitly set y-axis limits between 0 and 100
scale_size_continuous(limits = size_limits, range = size_range) +
labs(
title = paste("BIPMA for target outcome Y =" , target.Y),
x = "Importance",
y = "Performance"
) +
theme_minimal() + # Use a clean theme
guides(size = "none") # Explicitly remove size legend
print(pc)
}
7.4.16 ncasem_example.R
# Function to conduct NCA-SEM
#To be sourced:
# source("unstandardize.R")
# source("normalize.R")
# source("get_outliers.R")
# source("get_ipma_df.R")
# source("get_ipma_plot.R")
# source("get_bottleneck_cases.R") #extracted from bottleneck table
# source("get_cipma_df.R")
# source("get_cipma_plot.R")
# source("get_single_bottleneck_cases.R")
# source("get_bipma_df.R")
# source("get_bipma_plot.R")
# source("get_indicator_data.R")
# source("estimate_sem_model.R")
# Assuming Step 1 (hypotheses) and Step 2 (SEM) have been conducted
# Step 3-5
ncasem <- function(sem,
outliers = 0,
standardize = 0, # unstandardized
normalize = 1, # normalized
scale = c(0,100), # normalized 0-100 (percentage)
predictors,
predicted,
theoretical_min,
theoretical_max,
ceiling = "ce_fdh",
d_threshold = 0.10,
p_threshold = 0.05,
scope = NULL, # empirical scope
target_outcome,
plots = FALSE) {
#Step 3 select latent variable scores
data <- sem$construct_scores
# Standardization step
if (standardize == 0) {
source("unstandardize.R")
data <- unstandardize(sem)
}
# Normalization step
if (normalize == 0) {
data <- data # No normalization
} else {
source("normalize.R")
data <- min_max_normalize(data, theoretical_min, theoretical_max, scale)
}
# Step 4: Conduct NCA
library(NCA)
data <- data
ceilings = ceiling
test.rep = 10000 #set the number of permutations for estimating the p value
bottleneck.y = "actual"
bottleneck.x = "percentile"
steps=seq(0, 100, 5)
scope=scope
# Process outliers if needed
if (outliers != 0) {
source("get_outliers.R")
conditions = predictors
outcome = predicted
out <- get_outliers(data, conditions, outcome, ceiling, outliers)
all_outliers <- unname(unlist(out$outliers_df)[!is.na(out$outliers_df)])
# Remove outliers
data <- data[-all_outliers, ]
}
# Run the NCA analysis
model <- nca_analysis(
data,
predictors,
predicted,
ceilings = ceiling,
bottleneck.x = bottleneck.x,
bottleneck.y = bottleneck.y,
steps = steps,
test.rep = test.rep,
scope = scope
)
nca_output(model, summaries = FALSE, pdf = plots)
# Step 5: Produce BIPMA
# IPMA
source("get_ipma_df.R")
IPMA_df <- get_ipma_df(data = data, sem = sem, predictors = predictors, predicted = predicted)
ipma_df <- IPMA_df
source("get_ipma_plot.R")
IPMA_plot <- get_ipma_plot(ipma_df = ipma_df, x_range = x_range)
if (plots) {
current_time <- format(Sys.time(), "%Y-%m-%d_%H-%M-%S")
filename <- paste0("IPMA plot_", current_time, ".pdf")
ggsave(filename, plot = IPMA_plot, width = 8, height = 6, dpi = 300)
}
# cIPMA
source("get_bottleneck_cases.R") #extracted from bottleneck table
bottlenecks <- get_bottleneck_cases(model, target_outcome)
source("get_cipma_df.R")
CIPMA_df <- get_cipma_df(ipma_df = ipma_df, bottlenecks = bottlenecks)
source("get_cipma_plot.R")
cipma_df <- CIPMA_df
CIPMA_plot <- get_cipma_plot(cipma_df = cipma_df, x_range = x_range, size_limits = size_limits, size_range = size_range)
if (plots) {
current_time <- format(Sys.time(), "%Y-%m-%d_%H-%M-%S")
filename <- paste0("CIPMA plot_", current_time, ".pdf")
ggsave(filename, plot = CIPMA_plot, width = 8, height = 6, dpi = 300)
}
# BIPMA
source("get_single_bottleneck_cases.R")
source("get_bipma_df.R")
single_bottlenecks <- get_single_bottleneck_cases (data, conditions = predictors, outcome = predicted, target_outcome = target_outcome, ceiling = ceiling)
BIPMA_df <- get_bipma_df(ipma_df = ipma_df, single_bottlenecks = single_bottlenecks)
source("get_bipma_plot.R")
bipma_df <- BIPMA_df
BIPMA_plot <- get_bipma_plot(bipma_df = bipma_df, x_range = x_range, size_limits = size_limits, size_range = size_range)
if (plots) {
current_time <- format(Sys.time(), "%Y-%m-%d_%H-%M-%S")
filename <- paste0("BIPMA plot_", current_time, ".pdf")
ggsave(filename, plot = BIPMA_plot, width = 8, height = 6, dpi = 300)
}
# Extract results
effect_size <- sapply(predictors, function(p) model$summaries[[p]][[2]][[2]])
p_value <- sapply(predictors, function(p) model$summaries[[p]][[2]][[6]])
necessity <- ifelse(effect_size >= d_threshold & p_value <= p_threshold, "yes", "no")
results_df <- data.frame(
predictor = predictors,
effect_size = effect_size,
p_value = p_value,
necessity = necessity,
Bottlenecks1 = (single_bottlenecks),
Bottlenecks2 = (single_bottlenecks)*nrow(data)/100,
row.names = NULL
)
colnames(results_df) <- c("Predictor", "Effect size", "p value", "Necessity", "Single bottlenecks(%)", "Single bottlenecks(#)")
results_df$Priority <- rank(-results_df[,6])
result <- list(data = data, results =results_df)
return(result)
}
### Example
# Step 2: Conduct PLS-SEM with SEMinR on TAM data
source("get_indicator_data.R")
source("estimate_sem_model.R")
# Step 3-5 using ncasem function
#Step 2
# SEM model
sem <- TAM_pls
# Step 3-5
standardize <- 0
normalize <- 1
predictors <- c("Perceived usefulness", "Compatibility", "Perceived ease of use", "Emotional value", "Adoption intention")
predicted <- "Technology use"
theoretical_min <- c(1,1,1,1,1,1) # Min values original scale
theoretical_max <- c(5,5,5,5,5,7) # Max values original scale
# NCA
ceiling = "ce_fdh"
target_outcome <- 85
d_threshold <- 0.10
p_threshold <- 0.05
outliers = 0
scope = NULL
# (x)IPMA plots
plots= FALSE
scale <- c(0,100) # Define new scale
x_range <- c(0,0.6) # range of values of the Importance x-axis of the (x)IPMA plot
size_limits = c(0,100)
size_range <- c(0.5,50) # the of the bubble of the (x)IPMA plot
original <- ncasem(sem,
outliers,
standardize,
normalize,
scale,
predictors,
predicted,
theoretical_min,
theoretical_max,
ceiling,
d_threshold,
p_threshold,
scope,
target_outcome,
plots
)
original$results <- cbind(`Robustness check` = "Original", original$results)
print(dim(original$results)) # Check dimensions
original$results
# Step 6
### Ceiling change
ceiling = "cr_fdh"
other_ceiling <- ncasem(sem,
outliers,
standardize,
normalize,
scale,
predictors,
predicted,
theoretical_min,
theoretical_max,
ceiling,
d_threshold,
p_threshold,
scope,
target_outcome,
plots
)
other_ceiling$results <- cbind(`Robustness check` = "Ceiling change", other_ceiling$results)
other_ceiling$results
### d threshold change
ceiling = "ce_fdh"
d_threshold <- 0.20
other_d <- ncasem(sem,
outliers,
standardize,
normalize,
scale,
predictors,
predicted,
theoretical_min,
theoretical_max,
ceiling,
d_threshold,
p_threshold,
scope,
target_outcome,
plots
)
other_d$results <- cbind(`Robustness check` = "d value change", other_d$results)
other_d$results
### p threshold change
d_threshold <- 0.10
p_threshold <- 0.01
other_p <- ncasem(sem,
outliers,
standardize,
normalize,
scale,
predictors,
predicted,
theoretical_min,
theoretical_max,
ceiling,
d_threshold,
p_threshold,
scope,
target_outcome,
plots
)
other_p$results <- cbind(`Robustness check` = "p value change", other_p$results)
other_p$results
### Scope change
p_threshold <- 0.05
scope <- list( c(1,5,1,5), c(1,5,1,5), c(1,5,1,5), c(1,5,1,5), c(1,5,1,5)) #theoretical scope
other_scope <- ncasem(sem,
outliers,
standardize,
normalize,
scale,
predictors,
predicted,
theoretical_min,
theoretical_max,
ceiling,
d_threshold,
p_threshold,
scope,
target_outcome,
plots
)
other_scope$results <- cbind(`Robustness check` = "Scope change", other_scope$results)
other_scope$results
### Single outlier removal
scope <- NULL
outliers = 1
other_outliers_single <- ncasem(sem,
outliers,
standardize,
normalize,
scale,
predictors,
predicted,
theoretical_min,
theoretical_max,
ceiling,
d_threshold,
p_threshold,
scope,
target_outcome,
plots
)
other_outliers_single$results <- cbind(`Robustness check` = "Single outlier removal", other_outliers_single$results)
other_outliers_single$results
### Multiple outlier removal
outliers = 2
other_outliers_multiple <- ncasem(sem,
outliers,
standardize,
normalize,
scale,
predictors,
predicted,
theoretical_min,
theoretical_max,
ceiling,
d_threshold,
p_threshold,
scope,
target_outcome,
plots
)
other_outliers_multiple$results <- cbind(`Robustness check` = "Multiple outlier removal",other_outliers_multiple$results)
other_outliers_multiple$results
### Target lower (80)
outliers <- 0
target_outcome <- 80
other_target_lower <- ncasem(sem,
outliers,
standardize,
normalize,
scale,
predictors,
predicted,
theoretical_min,
theoretical_max,
ceiling,
d_threshold,
p_threshold,
scope,
target_outcome,
plots
)
other_target_lower$results <- cbind(`Robustness check` = "Target lower", other_target_lower$results)
other_target_lower$results
### Target higher (90)
target_outcome <- 90
other_target_higher <- ncasem(sem,
outliers,
standardize,
normalize,
scale,
predictors,
predicted,
theoretical_min,
theoretical_max,
ceiling,
d_threshold,
p_threshold,
scope,
target_outcome,
plots
)
other_target_higher$results <- cbind(`Robustness check` = "Target higher", other_target_higher$results)
other_target_higher$results
results_all <- rbind(original$results,
other_ceiling$results,
other_d$results,
other_p$results,
other_scope$results,
other_outliers_single$results,
other_outliers_multiple$results,
other_target_lower$results,
other_target_higher$results)
results_all <- results_all[c("Predictor",
"Robustness check",
"Effect size",
"p value",
"Necessity",
"Single bottlenecks(%)",
"Single bottlenecks(#)",
"Priority"
)]
# Reorganize the dataframe into new blocks
n <- length(predictors) # Block size
rows_per_block <- seq(1, n) # Sequence of row indices in one block
# Create a new column to assign new block numbers
results_all$NewBlock <- rep(rows_per_block, length.out = nrow(results_all))
# Split the dataframe into blocks and combine the blocks sequentially
reorganized_df <- do.call(rbind, lapply(rows_per_block, function(i) {
results_all[results_all$NewBlock == i, ] # Subset rows for the current block
}))
# Remove the helper column and reset row names
reorganized_df <- reorganized_df[, -ncol(reorganized_df)]
row.names(reorganized_df) <- NULL
# View the reorganized dataframe
print(reorganized_df)
library(xlsx)
write.xlsx(reorganized_df, "robustness_table_sem.xlsx", row.names = FALSE)