Study 2.2: Semantic decision

The semantic decision task probes the role of concreteness in conceptual processing. Specifically, this task requires participants to classify words as abstract or concrete, which elicits deeper semantic processing than the task of identifying word forms (i.e., lexical decision). Researchers then analyse whether the responses can be explained by the sensory experientiality of the referents—that is, the degree to which they can be experienced through our senses—and by other variables, such as word frequency. The core data set in this study was that of the Calgary Semantic Decision Project (Pexman et al., 2017; Pexman & Yap, 2018). The experimental task is semantic decision, in which participants judge whether words are primarily abstract (e.g., thought) or concrete (e.g., building).

Research has found that the processing of relatively concrete words relies considerably on sensorimotor information (Hultén et al., 2021; Kousta et al., 2011; Vigliocco et al., 2014). In contrast, the processing of relatively abstract words seems to draw more heavily on information from language (Barca et al., 2020; Duñabeitia et al., 2009; Snefjella & Blank, 2020), emotion (Kousta et al., 2011; Ponari, Norbury, Rotaru, et al., 2018; Ponari, Norbury, & Vigliocco, 2018; Ponari et al., 2020; Vigliocco et al., 2014), interoception (Connell et al., 2018) and social information (Borghi et al., 2019, 2022; Diveica et al., 2022).

Methods

Data set

Code

# Calculate some of the sample sizes to be reported in the paragraph below

# Number of words per participant.
# Save mean as integer and SD rounded while keeping trailing zeros
semanticdecision_mean_words_per_participant = 
  semanticdecision %>% group_by(Participant) %>% 
  summarise(length(unique(Word))) %>% 
  select(2) %>% unlist %>% mean %>% round(0)

semanticdecision_SD_words_per_participant = 
  semanticdecision %>% group_by(Participant) %>% 
  summarise(length(unique(Word))) %>% 
  select(2) %>% unlist %>% sd %>% sprintf('%.2f', .)

# Number of participants per word.
# Save mean as integer and SD rounded while keeping trailing zeros
semanticdecision_mean_participants_per_word = 
  semanticdecision %>% group_by(Word) %>% 
  summarise(length(unique(Participant))) %>% 
  select(2) %>% unlist %>% mean %>% round(0)

semanticdecision_SD_participants_per_word = 
  semanticdecision %>% group_by(Word) %>% 
  summarise(length(unique(Participant))) %>% 
  select(2) %>% unlist %>% sd %>% sprintf('%.2f', .)

The data set was trimmed by removing rows that lacked values on any variable, and by also removing RTs that were more than 3 standard deviations away from the mean.14 The standard deviation trimming was performed within participants and within trial blocks, as done in the Calgary Semantic Decision Project (Pexman et al., 2017). The resulting data set contained 306 participants, 8,927 words and 246,432 RTs. On average, there were 755 words per participant (SD = 42.05), and conversely, 26 participants per word (SD = 4.80).

Variables

While the variables are outlined in the general introduction, a few further details are provided below regarding some of them.

Vocabulary size

In the vocabulary test used by Pexman et al. (2017), participants were presented with 35 rare words with irregular pronunciations (e.g., gaoled, ennui), and they were asked to read the words aloud (also see Pexman & Yap, 2018). When they pronounced a word correctly, it was inferred that they knew the word. This test was based on NAART35, a short version of the North American Adult Reading Test (Uttl, 2002).

Word co-occurrence

Wingfield and Connell (2022b) reanalysed the data from Pexman et al. (2017) using language-based variables that are more related to the language system than to the visual system. The task used by Pexman et al. was semantic decision, in which participants assessed whether words were abstract or concrete. Wingfield and Connell found that the variables that best explained RTs were word co-occurrence measures. Specifically, one of these variables was the corpus distance between each stimulus word and the word ‘abstract’. The other variable was the corpus distance between each stimulus word and the word ‘concrete’. Wingfield and Connell studied these distance measures in various forms, and found that cosine and correlation distance yielded the best results. We used the correlation distances following the advice of Kiela and Bottou (2014).

The zero-order correlation between Wingfield and Connell’s (2022) distance to ‘abstract’ and distance to ‘concrete’ was \(r\) = .98. To avoid the collinearity between these variables in the model (Dormann et al., 2013; Harrison et al., 2018), and to facilitate the analysis of interactions with other variables, we created a difference score by subtracting the distance to ‘abstract’ from the distance to ‘concrete’. This new variable was named ‘word co-occurrence’. As shown in Figure 11, the correlation between word co-occurrence and word concreteness was twice as large as the correlation between either form of the distance and word concreteness. This suggested that the difference score had successfully encapsulated the information of both distances.

Code

# Using the following variables...
semanticdecision[, c('word_concreteness', 'word_cooccurrence',
                     'Conditional_probability_BNC_r5_correlation_concrete_distance',
                     'Conditional_probability_BNC_r5_correlation_abstract_distance')] %>%
  
  # renamed for the sake of clarity
  rename('Word concreteness' = word_concreteness, 
         'Word co-occurrence' = word_cooccurrence,
         "Distance to 'concrete'" = Conditional_probability_BNC_r5_correlation_concrete_distance,
         "Distance to 'abstract'" = Conditional_probability_BNC_r5_correlation_abstract_distance) %>%
  
  # make correlation matrix (custom function from the 'R_functions' folder)
  correlation_matrix() + 
  theme(plot.margin = unit(c(0, -0.5, 0.05, -3.78), 'in'))

Figure 11: Zero-order correlations among Wingfield and Connell’s (2022) distances, the difference score (word co-occurrence) and word concreteness (Brysbaert et al., 2014).

A few details regarding the covariates follow.

  • Information uptake was included as a measure akin to general cognition, and specifically as a covariate of vocabulary size (Ratcliff et al., 2010; also see James et al., 2018; Pexman & Yap, 2018). Information uptake was effectively the drift rate per participant in Pexman and Yap (2018). This drift rate measured participants’ ability to correctly and quickly perform the semantic decision task, in which they classified words as abstract or concrete (for graphical illustrations, see Lerche et al., 2020; van Ravenzwaaij et al., 2012). In other words, drift rate measures an individual’s ability (Lerche et al., 2020; Pexman & Yap, 2018).

  • Lexical covariates (see Appendix A): word frequency and orthographic Levenshtein distance (Balota et al., 2007).

  • Word concreteness (Brysbaert et al., 2014): a fundamental variable in the semantic decision task, in which participants judge whether words are abstract or concrete (for further considerations, see Bottini et al., 2021). Indeed, owing to the instructions of the task, word concreteness is likely to be more relevant to the participants’ task than our effects of interest.

Figure 12 shows the correlations among the predictors and the dependent variable.

Code

# Using the following variables...
semanticdecision[, c('z_RTclean', 'z_vocabulary_size', 
                     'z_information_uptake', 'z_word_cooccurrence', 
                     'z_visual_rating', 'z_word_concreteness', 
                     'z_word_frequency', 
                     'z_orthographic_Levenshtein_distance')] %>%
  
  # renamed for the sake of clarity
  rename('RT' = z_RTclean, 
         'Vocabulary size' = z_vocabulary_size,
         'Information uptake' = z_information_uptake,
         "Word co-occurrence" = z_word_cooccurrence,
         'Visual strength' = z_visual_rating,
         'Word concreteness' = z_word_concreteness,
         'Word frequency' = z_word_frequency,
         'Orthographic Levenshtein distance' = z_orthographic_Levenshtein_distance) %>%
  
  # make correlation matrix (custom function from the 'R_functions' folder)
  correlation_matrix() + 
  theme(plot.margin = unit(c(0, 0, 0.1, -3.1), 'in'))

Figure 12: Zero-order correlations in the semantic decision study.

Diagnostics for the frequentist analysis

The model presented convergence warnings. To avoid removing important random slopes, which could increase the Type I error rate—i.e., false positives (Brauer & Curtin, 2018; Singmann & Kellen, 2019), we examined the model after refitting it using seven optimisation algorithms through the ‘allFit’ function of the ‘lme4’ package (Bates et al., 2021). The results showed that all optimisers produced virtually identical means for all effects, suggesting that the convergence warnings were not consequential (Bates et al., 2021; see Appendix B).

Code

# Calculate VIF for every predictor and return only the maximum VIF rounded up
maxVIF_semanticdecision = car::vif(semanticdecision_lmerTest) %>% max %>% ceiling

The residual errors were not normally distributed, and attempts to mitigate this deviation proved unsuccessful (see Appendix B). However, this is not likely to have posed a major problem, as mixed-effects models are fairly robust to deviations from normality (Knief & Forstmeier, 2021; Schielzeth et al., 2020). Last, the model did not present multicollinearity problems, with all VIFs below 2 (see Dormann et al., 2013; Harrison et al., 2018).

Diagnostics for the Bayesian analysis

Code

# Calculate number of post-warmup draws (as in 'brms' version 2.17.0).
# Informative prior model used but numbers are identical in the three models.
semanticdecision_post_warmup_draws = 
  (semanticdecision_summary_informativepriors_exgaussian$iter -
     semanticdecision_summary_informativepriors_exgaussian$warmup) *
  semanticdecision_summary_informativepriors_exgaussian$chains

# As a convergence diagnostic, find maximum R-hat value for the 
# fixed effects across the three models.
semanticdecision_fixedeffects_max_Rhat = 
  max(semanticdecision_summary_informativepriors_exgaussian$fixed$Rhat,
      semanticdecision_summary_weaklyinformativepriors_exgaussian$fixed$Rhat,
      semanticdecision_summary_diffusepriors_exgaussian$fixed$Rhat) %>% 
  # Round
  sprintf('%.2f', .)

# Next, find find maximum R-hat value for the random effects across the three models
semanticdecision_randomeffects_max_Rhat = 
  max(semanticdecision_summary_informativepriors_exgaussian$random[['Participant']]$Rhat,
      semanticdecision_summary_weaklyinformativepriors_exgaussian$random[['Participant']]$Rhat,
      semanticdecision_summary_diffusepriors_exgaussian$random[['Participant']]$Rhat,
      semanticdecision_summary_informativepriors_exgaussian$random[['Word']]$Rhat,
      semanticdecision_summary_weaklyinformativepriors_exgaussian$random[['Word']]$Rhat,
      semanticdecision_summary_diffusepriors_exgaussian$random[['Word']]$Rhat) %>% 
  # Round
  sprintf('%.2f', .)

Three Bayesian models were run that were respectively characterised by informative, weakly-informative and diffuse priors. In each model, 16 chains were used. In each chain, 2,000 warmup iterations were run, followed by 6,000 post-warmup iterations. Thus, a total of 96,000 post-warmup draws were produced over all the chains.

The maximum \(\widehat R\) value for the fixed effects across the three models was 1.42, far exceeding the 1.01 threshold (Vehtari et al., 2021; also see Schoot et al., 2021). Similarly, the maximum \(\widehat R\) value for the random effects was 1.31. Furthermore, the posterior predictive checks revealed major divergences between the observed data and the posterior distributions (see Appendix C). In conclusion, since the Bayesian results were not valid, they are not shown in the main text, but are available in Appendix E.

Results of Study 2.2

Code

# Calculate R^2. This coefficient must be interpreted with caution 
# (Nakagawa et al., 2017; https://doi.org/10.1098/rsif.2017.0213). 
# Also, transform coefficient to rounded percentage.

Nakagawa2017_fixedeffects_R2_semanticdecision_lmerTest = 
  paste0(
    (MuMIn::r.squaredGLMM(semanticdecision_lmerTest)[1, 'R2m'][[1]] * 100) %>% 
      sprintf('%.2f', .), '%'
  )

Nakagawa2017_randomeffects_R2_semanticdecision_lmerTest = 
  paste0(
    (MuMIn::r.squaredGLMM(semanticdecision_lmerTest)[1, 'R2c'][[1]] * 100) %>% 
      sprintf('%.2f', .), '%'
  )

Table 4 presents the results. The fixed effects explained 4.11% of the variance, and the random effects explained 17.48% (Nakagawa et al., 2017; for an explanation of this difference, see Results of Study 2.1). Both word co-occurrence and visual strength produced significant main effects. Higher values of these variables facilitated participants’ performance, as reflected in shorter RTs. Furthermore, visual strength interacted with vocabulary size. There were no effects of participants’ gender (see interaction figures below).

The effect sizes of word co-occurrence and its interactions were larger than those of visual strength. Figure 13 displays these estimates.15

Code

# Rename effects in plain language and specify the random slopes
# (if any) for each effect, in the footnote. For this purpose, 
# superscripts are added to the names of the appropriate effects.
# 
# In the interactions below, word-level variables are presented 
# first for the sake of consistency (the order does not affect 
# the results in any way). Also in the interactions, double 
# colons are used to inform the 'frequentist_model_table' 
# function that the two terms in the interaction must be split 
# into two lines.

rownames(KR_summary_semanticdecision_lmerTest$coefficients)[
  rownames(KR_summary_semanticdecision_lmerTest$coefficients) == 
    'z_information_uptake'] = 'Information uptake'

rownames(KR_summary_semanticdecision_lmerTest$coefficients)[
  rownames(KR_summary_semanticdecision_lmerTest$coefficients) == 
    'z_vocabulary_size'] = 'Vocabulary size <sup>a</sup>'

rownames(KR_summary_semanticdecision_lmerTest$coefficients)[
  rownames(KR_summary_semanticdecision_lmerTest$coefficients) == 
    'z_recoded_participant_gender'] = 'Gender <sup>a</sup>'

rownames(KR_summary_semanticdecision_lmerTest$coefficients)[
  rownames(KR_summary_semanticdecision_lmerTest$coefficients) == 
    'z_word_frequency'] = 'Word frequency'

rownames(KR_summary_semanticdecision_lmerTest$coefficients)[
  rownames(KR_summary_semanticdecision_lmerTest$coefficients) == 
    'z_orthographic_Levenshtein_distance'] = 'Orthographic Levenshtein distance'

rownames(KR_summary_semanticdecision_lmerTest$coefficients)[
  rownames(KR_summary_semanticdecision_lmerTest$coefficients) == 
    'z_word_concreteness'] = 'Word concreteness'

rownames(KR_summary_semanticdecision_lmerTest$coefficients)[
  rownames(KR_summary_semanticdecision_lmerTest$coefficients) == 
    'z_word_cooccurrence'] = "Word co-occurrence <sup>b</sup>"

rownames(KR_summary_semanticdecision_lmerTest$coefficients)[
  rownames(KR_summary_semanticdecision_lmerTest$coefficients) == 
    'z_visual_rating'] = 'Visual strength <sup>b</sup>'

rownames(KR_summary_semanticdecision_lmerTest$coefficients)[
  rownames(KR_summary_semanticdecision_lmerTest$coefficients) == 
    'z_word_concreteness:z_vocabulary_size'] = 
  'Word concreteness : Vocabulary size'

rownames(KR_summary_semanticdecision_lmerTest$coefficients)[
  rownames(KR_summary_semanticdecision_lmerTest$coefficients) == 
    'z_word_concreteness:z_recoded_participant_gender'] = 
  'Word concreteness : Gender'

rownames(KR_summary_semanticdecision_lmerTest$coefficients)[
  rownames(KR_summary_semanticdecision_lmerTest$coefficients) == 
    'z_information_uptake:z_word_cooccurrence'] = 
  "Word co-occurrence : Information uptake"

rownames(KR_summary_semanticdecision_lmerTest$coefficients)[
  rownames(KR_summary_semanticdecision_lmerTest$coefficients) == 
    'z_information_uptake:z_visual_rating'] = 
  'Visual strength : Information uptake'

rownames(KR_summary_semanticdecision_lmerTest$coefficients)[
  rownames(KR_summary_semanticdecision_lmerTest$coefficients) == 
    'z_vocabulary_size:z_word_cooccurrence'] = 
  "Word co-occurrence : Vocabulary size"

rownames(KR_summary_semanticdecision_lmerTest$coefficients)[
  rownames(KR_summary_semanticdecision_lmerTest$coefficients) == 
    'z_vocabulary_size:z_visual_rating'] = 
  'Visual strength : Vocabulary size'

rownames(KR_summary_semanticdecision_lmerTest$coefficients)[
  rownames(KR_summary_semanticdecision_lmerTest$coefficients) == 
    'z_recoded_participant_gender:z_word_cooccurrence'] = 
  "Word co-occurrence : Gender"

rownames(KR_summary_semanticdecision_lmerTest$coefficients)[
  rownames(KR_summary_semanticdecision_lmerTest$coefficients) == 
    'z_recoded_participant_gender:z_visual_rating'] = 
  'Visual strength : Gender'


# Next, change the names in the confidence intervals object

rownames(confint_semanticdecision_lmerTest)[
  rownames(confint_semanticdecision_lmerTest) == 
    'z_information_uptake'] = 'Information uptake'

rownames(confint_semanticdecision_lmerTest)[
  rownames(confint_semanticdecision_lmerTest) == 
    'z_vocabulary_size'] = 'Vocabulary size <sup>a</sup>'

rownames(confint_semanticdecision_lmerTest)[
  rownames(confint_semanticdecision_lmerTest) == 
    'z_recoded_participant_gender'] = 'Gender <sup>a</sup>'

rownames(confint_semanticdecision_lmerTest)[
  rownames(confint_semanticdecision_lmerTest) == 
    'z_word_frequency'] = 'Word frequency'

rownames(confint_semanticdecision_lmerTest)[
  rownames(confint_semanticdecision_lmerTest) == 
    'z_orthographic_Levenshtein_distance'] = 'Orthographic Levenshtein distance'

rownames(confint_semanticdecision_lmerTest)[
  rownames(confint_semanticdecision_lmerTest) == 
    'z_word_concreteness'] = 'Word concreteness'

rownames(confint_semanticdecision_lmerTest)[
  rownames(confint_semanticdecision_lmerTest) == 
    'z_word_cooccurrence'] = "Word co-occurrence <sup>b</sup>"

rownames(confint_semanticdecision_lmerTest)[
  rownames(confint_semanticdecision_lmerTest) == 
    'z_visual_rating'] = 'Visual strength <sup>b</sup>'

rownames(confint_semanticdecision_lmerTest)[
  rownames(confint_semanticdecision_lmerTest) == 
    'z_word_concreteness:z_vocabulary_size'] = 
  'Word concreteness : Vocabulary size'

rownames(confint_semanticdecision_lmerTest)[
  rownames(confint_semanticdecision_lmerTest) == 
    'z_word_concreteness:z_recoded_participant_gender'] = 
  'Word concreteness : Gender'

rownames(confint_semanticdecision_lmerTest)[
  rownames(confint_semanticdecision_lmerTest) == 
    'z_information_uptake:z_word_cooccurrence'] = 
  "Word co-occurrence : Information uptake"

rownames(confint_semanticdecision_lmerTest)[
  rownames(confint_semanticdecision_lmerTest) == 
    'z_information_uptake:z_visual_rating'] = 
  'Visual strength : Information uptake'

rownames(confint_semanticdecision_lmerTest)[
  rownames(confint_semanticdecision_lmerTest) == 
    'z_vocabulary_size:z_word_cooccurrence'] = 
  "Word co-occurrence : Vocabulary size"

rownames(confint_semanticdecision_lmerTest)[
  rownames(confint_semanticdecision_lmerTest) == 
    'z_vocabulary_size:z_visual_rating'] = 
  'Visual strength : Vocabulary size'

rownames(confint_semanticdecision_lmerTest)[
  rownames(confint_semanticdecision_lmerTest) == 
    'z_recoded_participant_gender:z_word_cooccurrence'] = 
  "Word co-occurrence : Gender"

rownames(confint_semanticdecision_lmerTest)[
  rownames(confint_semanticdecision_lmerTest) == 
    'z_recoded_participant_gender:z_visual_rating'] = 
  'Visual strength : Gender'


# Create table (using custom function from the 'R_functions' folder)
frequentist_model_table(
  KR_summary_semanticdecision_lmerTest, 
  confint_semanticdecision_lmerTest,
  order_effects = c('(Intercept)',
                    'Information uptake',
                    'Vocabulary size <sup>a</sup>',
                    'Gender <sup>a</sup>',
                    'Word frequency',
                    'Orthographic Levenshtein distance',
                    'Word concreteness',
                    "Word co-occurrence <sup>b</sup>",
                    'Visual strength <sup>b</sup>',
                    'Word concreteness : Vocabulary size',
                    'Word concreteness : Gender',
                    "Word co-occurrence : Information uptake",
                    'Visual strength : Information uptake',
                    "Word co-occurrence : Vocabulary size",
                    'Visual strength : Vocabulary size',
                    "Word co-occurrence : Gender",
                    'Visual strength : Gender'),
  interaction_symbol_x = TRUE,
  caption = 'Frequentist model for the semantic decision study.') %>%
  
  # Group predictors under headings
  pack_rows('Individual differences', 2, 4) %>% 
  pack_rows('Lexicosemantic covariates', 5, 7) %>% 
  pack_rows('Semantic variables', 8, 9) %>% 
  pack_rows('Interactions', 10, 17) %>% 
  
  # Apply white background to override default shading in HTML output
  row_spec(1:17, background = 'white') %>%
  
  # Highlight covariates
  row_spec(c(2, 5:7, 10:13), background = '#FFFFF1') %>%
  
  # Format
  kable_classic(full_width = FALSE, html_font = 'Cambria') %>%
  
  # Footnote describing abbreviations, random slopes, etc. 
  footnote(escape = FALSE, threeparttable = TRUE, 
           # The <p> below is used to enter a margin above the footnote 
           general_title = '<p style="margin-top: 10px;"></p>', 
           general = paste('*Note*. &beta; = Estimate based on $z$-scored predictors; *SE* = standard error;',
                           'CI = confidence interval. Yellow rows contain covariates. <br>', 
                           '<sup>a</sup> By-word random slopes were included for this effect.',
                           '<sup>b</sup> By-participant random slopes were included for this effect.'))
Table 4: Frequentist model for the semantic decision study.
β SE 95% CI t p
(Intercept) 0.05 0.00 [0.04, 0.06] 11.87 <.001
Individual differences
Information uptake 0.00 0.00 [0.00, 0.00] 0.20 .844
Vocabulary size a 0.00 0.00 [-0.01, 0.00] -1.42 .155
Gender a 0.00 0.00 [0.00, 0.00] -0.47 .636
Lexicosemantic covariates
Word frequency -0.12 0.00 [-0.13, -0.12] -28.63 <.001
Orthographic Levenshtein distance -0.01 0.00 [-0.02, 0.00] -3.05 .002
Word concreteness -0.13 0.01 [-0.14, -0.11] -21.39 <.001
Semantic variables
Word co-occurrence b -0.03 0.01 [-0.04, -0.02] -4.48 <.001
Visual strength b -0.02 0.01 [-0.03, -0.01] -2.91 .004
Interactions
Word concreteness × Vocabulary size -0.02 0.00 [-0.03, -0.02] -7.66 <.001
Word concreteness × Gender -0.01 0.00 [-0.02, 0.00] -3.50 <.001
Word co-occurrence × Information uptake 0.01 0.01 [0.00, 0.02] 1.48 .141
Visual strength × Information uptake 0.02 0.01 [0.01, 0.03] 3.05 .003
Word co-occurrence × Vocabulary size 0.01 0.01 [0.00, 0.02] 1.66 .098
Visual strength × Vocabulary size 0.01 0.01 [0.00, 0.02] 2.03 .043
Word co-occurrence × Gender 0.00 0.00 [-0.01, 0.01] 0.86 .393
Visual strength × Gender 0.00 0.00 [-0.01, 0.01] -0.08 .940

Note. β = Estimate based on \(z\)-scored predictors; SE = standard error; CI = confidence interval. Yellow rows contain covariates.
a By-word random slopes were included for this effect. b By-participant random slopes were included for this effect.
Code

# Run plot through source() rather than directly in this R Markdown document
# to preserve the format.

source('semanticdecision/frequentist_analysis/semanticdecision_confidence_intervals_plot.R', 
       local = TRUE)

include_graphics(
  paste0(
    getwd(),  # Circumvent illegal characters in file path
    '/semanticdecision/frequentist_analysis/plots/semanticdecision_confidence_intervals_plot.pdf'
  ))

Figure 13: Means and 95% confidence intervals for the effects of interest in the semantic decision study.

Figure 14-a shows the non-significant interaction between word co-occurrence and vocabulary size, whereby lower-vocabulary participants were more sensitive to word co-occurrence than higher-vocabulary participants. Next, Figure 14-b shows the significant interaction between visual strength and vocabulary size, demonstrating that lower-vocabulary participants were also more sensitive to visual strength. Last, Figure 14-c shows the significant interaction between word concreteness and vocabulary size, whereby higher-vocabulary participants were more sensitive to word concreteness than lower-vocabulary participants. Word concreteness is likely the most relevant variable for the semantic decision task, in which participants classify words as abstract or concrete. In conclusion, these interactions suggest that higher-vocabulary participants were better able to focus on the most relevant information, whereas lower-vocabulary participants were sensitive to a greater breadth of information (see Lim et al., 2020; Pexman & Yap, 2018; Yap et al., 2009, 2012, 2017).

Code

# Run plot through source() rather than directly in this R Markdown document 
# to preserve the italicised text.

source('semanticdecision/frequentist_analysis/semanticdecision-interactions-with-vocabulary-size.R', 
       local = TRUE)

include_graphics(
  paste0(
    getwd(),  # Circumvent illegal characters in file path
    '/semanticdecision/frequentist_analysis/plots/semanticdecision-interactions-with-vocabulary-size.pdf'
  ))

Figure 14: Interactions of vocabulary size with language-based information (panel a), with visual strength (panel b) and with word concreteness (panel c) in the semantic decision study. Vocabulary size is constrained to deciles in this plot, whereas in the statistical analysis it contained more values within the current range. nnumber of participants contained between deciles.

A continuous measure of word concreteness was used in the present study. In contrast, Pexman and Yap (2018) split the data set it into a subset with abstract words and another subset with concrete words, and they analysed these subsets separately. Pexman and Yap found that high-vocabulary participants were more sensitive to the relative abstractness of words. Specifically, these participants were faster to classify very abstract words than mid-abstract ones, thus presenting a reverse concreteness effect (also see Bonner et al., 2009). Such a reverse effect might stem from the bimodal distributions that have appeared in concreteness ratings (Brysbaert et al., 2014) and in semantic decisions (Pexman & Yap, 2018), or it might be due to confounding variables (Hoffman & Lambon Ralph, 2011). Notwithstanding the bimodal distributions, Troche et al. (2017) suggested that a continuous analysis remained necessary to study word concreteness (also see Cohen, 1983). Consistent with this, our present findings demonstrated the sensitivity of a continuous word concreteness variable to patterns such as the greater role of task-relevant variables in high-vocabulary participants. In conclusion, the literature and our findings suggest that the split-data approach and the continuous approach to word concreteness are both useful. Where it is feasible, the application of both approaches would provide the greatest information.

Figure 15 shows the interactions with gender. The interactions of interest, in panels a and b, were non-significant.16

Code

# Run plot through source() rather than directly in this R Markdown document 
# to preserve the italicised text.

source('semanticdecision/frequentist_analysis/semanticdecision-interactions-with-gender.R', 
       local = TRUE)

include_graphics(
  paste0(
    getwd(),  # Circumvent illegal characters in file path
    '/semanticdecision/frequentist_analysis/plots/semanticdecision-interactions-with-gender.pdf'
  ))

Figure 15: Interactions of gender with word co-occurrence (panel a), with visual strength (panel b) and with word concreteness (panel c) in the semantic decision study. Gender was analysed using z-scores, but for clarity, the basic labels are used in the legend.

Statistical power analysis

Figures 16 and 17 show the estimated power for some main effects and interactions of interest as a function of the number of participants. To plan the sample size for future studies, these results must be considered under the assumptions that the future study would apply a statistical method similar to ours—namely, a mixed-effects model with random intercepts and slopes—, and that the analysis would encompass at least as many words as the current study, namely, 8,927 (distributed in various blocks across participants, not all being presented to every participant). Furthermore, it is necessary to consider each figure in detail. Here, we provide a summary. First, detecting the main effect of word co-occurrence would require 300 participants. Second, detecting the main effect of visual strength would require 1,200 participants. Third, detecting the interactions of word co-occurrence and visual strength with vocabulary size would require 1,500 participants. Last, detecting the other effects would require more than 2,000 participants—or, in the case of gender differences, many more than that.

Code

# Run plot through source() rather than directly in this R Markdown document 
# to preserve the italicised text.
source('semanticdecision/power_analysis/semanticdecision_all_powercurves.R', 
       local = TRUE)

include_graphics(
  paste0(
    getwd(),  # Circumvent illegal characters in file path
    '/semanticdecision/power_analysis/plots/semanticdecision_powercurve_plots_1_2_3.pdf'
  ))

Figure 16: Power curves for some main effects in the semantic decision study.

Code

include_graphics(
  paste0(
    getwd(),  # Circumvent illegal characters in file path
    '/semanticdecision/power_analysis/plots/semanticdecision_powercurve_plots_4_5_6_7.pdf'
  ))

Figure 17: Power curves for some interactions in the semantic decision study.

Discussion of Study 2.2

The results revealed a significant, facilitatory effect of word co-occurrence and a smaller but significant, facilitatory effect of visual strength. That is, higher values of these variables resulted in shorter RTs. Furthermore, there were significant interactions. First, language-based priming was larger in higher-vocabulary participants than in lower-vocabulary ones. Second, both language-based priming and vision-based priming were larger with the short SOA than with the long one. Thus far, these results broadly replicated those of Petilli et al. (2021). As in Study 2.1, vision-based information had a significant effect. This was to be expected, as semantic decision is likely to engage deeper semantic processing. Last, no effect of gender was found. Below, we delve into some other aspects of these results.

Statistical power analysis

We analysed the statistical power associated with several effects of interest, across various sample sizes. The results of this power analysis can help determine the number of participants required to reliably examine each of these effects in a future study. Importantly, the results assume two conditions. First, the future study would apply a statistical method similar to ours—namely, a mixed-effects model with random intercepts and slopes. Second, the analysis of the future study would encompass at least 8,927 stimulus words (distributed in various blocks across participants, not all being presented to every participant).

First, the results revealed that detecting the main effect of word co-occurrence would require 300 participants. Next, detecting the interactions with vocabulary size would require 1,500 participants. Last, detecting the other effects would require more than 2,000 participants—or, in the case of gender differences, many more than that.

References

Balota, D. A., Yap, M. J., Hutchison, K. A., Cortese, M. J., Kessler, B., Loftis, B., Neely, J. H., Nelson, D. L., Simpson, G. B., & Treiman, R. (2007). The English Lexicon Project. Behavior Research Methods, 39, 445–459. https://doi.org/10.3758/BF03193014
Barca, L., Mazzuca, C., & Borghi, A. (2020). Overusing the pacifier during infancy sets a footprint on abstract words processing. Journal of Child Language, 47(5), 1084–1099. https://doi.org/10.1017/S0305000920000070
Bates, D., Maechler, M., Bolker, B., Walker, S., Christensen, R. H. B., Singmann, H., Dai, B., Scheipl, F., Grothendieck, G., Green, P., Fox, J., Brauer, A., & Krivitsky, P. N. (2021). Package ’lme4. CRAN. https://cran.r-project.org/web/packages/lme4/lme4.pdf
Bonner, M. F., Vesely, L., Price, C., Anderson, C., Richmond, L., Farag, C., Avants, B., & Grossman, M. (2009). Reversal of the concreteness effect in semantic dementia. Cognitive Neuropsychology, 26(6), 568–579. https://doi.org/10.1080/02643290903512305
Borghi, A. M., Barca, L., Binkofski, F., Castelfranchi, C., Pezzulo, G., & Tummolini, L. (2019). Words as social tools: Language, sociality and inner grounding in abstract concepts. Physics of Life Reviews, 29, 120–153. https://doi.org/10.1016/j.plrev.2018.12.001
Borghi, A. M., Shaki, S., & Fischer, M. H. (2022). Abstract concepts: External influences, internal constraints, and methodological issues. Psychological Research. https://doi.org/10.1007/s00426-022-01698-4
Bottini, R., Morucci, P., D’Urso, A., Collignon, O., & Crepaldi, D. (2021). The concreteness advantage in lexical decision does not depend on perceptual simulations. Journal of Experimental Psychology: General. https://doi.org/10.1037/xge0001090
Brauer, M., & Curtin, J. J. (2018). Linear mixed-effects models and the analysis of nonindependent data: A unified framework to analyze categorical and continuous independent variables that vary within-subjects and/or within-items. Psychological Methods, 23(3), 389–411. https://doi.org/10.1037/met0000159
Brysbaert, M., Warriner, A. B., & Kuperman, V. (2014). Concreteness ratings for 40 thousand generally known English word lemmas. Behavior Research Methods, 46, 904–911. https://doi.org/10.3758/s13428-013-0403-5
Cohen, J. (1983). The cost of dichotomization. Applied Psychological Measurement, 7(3), 249–253. https://doi.org/10.1177/014662168300700301
Connell, L., Lynott, D., & Banks, B. (2018). Interoception: The forgotten modality in perceptual grounding of abstract and concrete concepts. Philosophical Transactions of the Royal Society B: Biological Sciences, 373(1752), 20170143. https://doi.org/10.1098/rstb.2017.0143
Diveica, V., Pexman, P. M., & Binney, R. J. (2022). Quantifying social semantics: An inclusive definition of socialness and ratings for 8388 English words. Behavior Research Methods. https://doi.org/10.3758/s13428-022-01810-x
Dormann, C. F., Elith, J., Bacher, S., Buchmann, C., Carl, G., Carré, G., Marquéz, J. R. G., Gruber, B., Lafourcade, B., Leitão, P. J., Münkemüller, T., McClean, C., Osborne, P. E., Reineking, B., Schröder, B., Skidmore, A. K., Zurell, D., & Lautenbach, S. (2013). Collinearity: A review of methods to deal with it and a simulation study evaluating their performance. Ecography, 36(1), 27–46. https://doi.org/10.1111/j.1600-0587.2012.07348.x
Duñabeitia, J. A., Avilés, A., Afonso, O., Scheepers, C., & Carreiras, M. (2009). Qualitative differences in the representation of abstract versus concrete words: Evidence from the visual-world paradigm. Cognition, 110(2), 284–292. https://doi.org/10.1016/j.cognition.2008.11.012
Harrison, X. A., Donaldson, L., Correa-Cano, M. E., Evans, J., Fisher, D. N., Goodwin, C., Robinson, B. S., Hodgson, D. J., & Inger, R. (2018). A brief introduction to mixed effects modelling and multi-model inference in ecology. PeerJ, 6, 4794. https://doi.org/10.7717/peerj.4794
Hoffman, P., & Lambon Ralph, M. A. (2011). Reverse concreteness effects are not a typical feature of semantic dementia: Evidence for the hub-and-spoke model of conceptual representation. Cerebral Cortex, 21(9), 2103–2112. https://doi.org/10.1093/cercor/bhq288
Hultén, A., Vliet, M. van, Kivisaari, S., Lammi, L., Lindh-Knuutila, T., Faisal, A., & Salmelin, R. (2021). The neural representation of abstract words may arise through grounding word meaning in language itself. Human Brain Mapping, 42(15), 4973–4984. https://onlinelibrary.wiley.com/doi/abs/10.1002/hbm.25593
James, A. N., Fraundorf, S. H., Lee, E. K., & Watson, D. G. (2018). Individual differences in syntactic processing: Is there evidence for reader-text interactions? Journal of Memory and Language, 102, 155–181. https://doi.org/10.1016/j.jml.2018.05.006
Kiela, D., & Bottou, L. (2014). Learning image embeddings using convolutional neural networks for improved multi-modal semantics. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP, 36–45. https://doi.org/10.3115/v1/D14-1005
Knief, U., & Forstmeier, W. (2021). Violating the normality assumption may be the lesser of two evils. Behavior Research Methods. https://doi.org/10.3758/s13428-021-01587-5
Kousta, S.-T., Vigliocco, G., Vinson, D. P., Andrews, M., & Del Campo, E. (2011). The representation of abstract words: Why emotion matters. Journal of Experimental Psychology: General, 140, 14–34. https://doi.org/10.1037/a0021446
Lerche, V., von Krause, M., Voss, A., Frischkorn, G. T., Schubert, A.-L., & Hagemann, D. (2020). Diffusion modeling and intelligence: Drift rates show both domain-general and domain-specific relations with intelligence. Journal of Experimental Psychology: General, 149(12), 2207–2249. https://doi.org/10.1037/xge0000774
Lim, R. Y., Yap, M. J., & Tse, C.-S. (2020). Individual differences in Cantonese Chinese word recognition: Insights from the Chinese Lexicon Project. Quarterly Journal of Experimental Psychology, 73(4), 504–518. https://doi.org/10.1177/1747021820906566
Petilli, M. A., Günther, F., Vergallito, A., Ciapparelli, M., & Marelli, M. (2021). Data-driven computational models reveal perceptual simulation in word processing. Journal of Memory and Language, 117, 104194. https://doi.org/10.1016/j.jml.2020.104194
Pexman, P. M., Heard, A., Lloyd, E., & Yap, M. J. (2017). The Calgary semantic decision project: Concrete/abstract decision data for 10,000 English words. Behavior Research Methods, 49(2), 407–417. https://doi.org/10.3758/s13428-016-0720-6
Pexman, P. M., & Yap, M. J. (2018). Individual differences in semantic processing: Insights from the Calgary semantic decision project. Journal of Experimental Psychology: Learning, Memory, and Cognition, 44(7), 1091–1112. https://doi.org/10.1037/xlm0000499
Ponari, M., Norbury, C. F., Rotaru, A., Lenci, A., & Vigliocco, G. (2018). Learning abstract words and concepts: Insights from developmental language disorder. Philosophical Transactions of the Royal Society B: Biological Sciences, 373, 20170140. https://doi.org/10.1098/rstb.2017.0140
Ponari, M., Norbury, C. F., & Vigliocco, G. (2018). Acquisition of abstract concepts is influenced by emotional valence. Developmental Science, 21(2), 12549. https://doi.org/10.1111/desc.12549
Ponari, M., Norbury, C. F., & Vigliocco, G. (2020). The role of emotional valence in learning novel abstract concepts. Developmental Psychology, 56(10), 1855–1865. https://doi.org/10.1037/dev0001091
Ratcliff, R., Thapar, A., & McKoon, G. (2010). Individual differences, aging, and IQ in two-choice tasks. Cognitive Psychology, 60, 127–157. https://doi.org/10.1016/j.cogpsych.2009.09.001
Schielzeth, H., Dingemanse, N. J., Nakagawa, S., Westneat, D. F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N. A., Garamszegi, L. Z., & Araya‐Ajoy, Y. G. (2020). Robustness of linear mixed‐effects models to violations of distributional assumptions. Methods in Ecology and Evolution, 11(9), 1141–1152. https://doi.org/10.1111/2041-210X.13434
Schoot, R. van de, Depaoli, S., Gelman, A., King, R., Kramer, B., Märtens, K., Tadesse, M. G., Vannucci, M., Willemsen, J., & Yau, C. (2021). Bayesian statistics and modelling. Nature Reviews Methods Primers, 1, 3. https://doi.org/10.1038/s43586-020-00003-0
Singmann, H., & Kellen, D. (2019). An introduction to mixed models for experimental psychology. In D. H. Spieler & E. Schumacher (Eds.), New methods in cognitive psychology (pp. 4–31). Psychology Press.
Snefjella, B., & Blank, I. (2020). Semantic norm extrapolation is a missing data problem. PsyArXiv. https://doi.org/10.31234/osf.io/y2gav
Troche, J., Crutch, S. J., & Reilly, J. (2017). Defining a conceptual topography of word concreteness: Clustering properties of emotion, sensation, and magnitude among 750 English words. Frontiers in Psychology, 8, 1787. https://doi.org/10.3389/fpsyg.2017.01787
Uttl, B. (2002). North American Adult Reading Test: Age norms, reliability, and validity. Journal of Clinical and Experimental Neuropsychology, 24(8), 1123–1137. https://doi.org/10.1076/jcen.24.8.1123.8375
van Ravenzwaaij, D., van der Maas, H. L. J., & Wagenmakers, E.-J. (2012). Optimal decision making in neural inhibition models. Psychological Review, 119(1), 201–215. https://doi.org/10.1037/a0026275
Vehtari, A., Gelman, A., Simpson, D., Carpenter, B., & Burkner, P.-C. (2021). Rank-normalization, folding, and localization: An improved R-hat for assessing convergence of MCMC. Bayesian Analysis, 16(2), 667–718. https://doi.org/10.1214/20-BA1221
Vigliocco, G., Kousta, S.-T., Della Rosa, P. A., Vinson, D. P., Tettamanti, M., Devlin, J. T., & Cappa, S. F. (2014). The neural representation of abstract words: The role of emotion. Cerebral Cortex, 7(24), 1767–1777. https://doi.org/10.1093/cercor/bht025
Wingfield, C., & Connell, L. (2022b). Understanding the role of linguistic distributional knowledge in cognition. Language, Cognition and Neuroscience, 1–51. https://doi.org/10.1080/23273798.2022.2069278
Yap, M. J., Balota, D. A., Sibley, D. E., & Ratcliff, R. (2012). Individual differences in visual word recognition: Insights from the English Lexicon Project. Journal of Experimental Psychology: Human Perception and Performance, 38, 1, 53–79. https://doi.org/10.1037/a0024177
Yap, M. J., Hutchison, K. A., & Tan, L. C. (2017). Individual differences in semantic priming performance: Insights from the semantic priming project. In M. N. Jones (Ed.), Frontiers of cognitive psychology. Big data in cognitive science (pp. 203–226). Routledge/Taylor & Francis Group.
Yap, M. J., Tse, C.-S., & Balota, D. A. (2009). Individual differences in the joint effects of semantic priming and word frequency revealed by RT distributional analyses: The role of lexical integrity. Journal of Memory and Language, 61(3), 303–325. https://doi.org/10.1016/j.jml.2009.07.001

  1. In the removal of missing values, six participants whose gender appeared as ‘NA’ were inadvertently removed from the data set.↩︎

  2. Only frequentist estimates shown, as Bayesian ones were not valid (see Appendix E).↩︎

  3. Further interaction plots available in Appendix D.↩︎




Pablo Bernabeu, 2022. Licence: CC BY 4.0.
Thesis: https://doi.org/10.17635/lancaster/thesis/1795.

Online book created using the R package bookdown.