Week 6 Model extension

Model extension can be used to quantify the effect of specific condition or manipulation on information processing. The extension can be super simple: splitting parameter(s) into N conditions. The extension can be complicated: additional cognitive mechanism is required and therefore, you should rearrange the current structure of the model. In this exercise, we will extend the drift-diffusion model (with three free-parameter: v, a, t0) based on the experimental manipulation.
To do so, you need to know the experimental design (at least manipulations), data structure and the model structure. The model has been explained extensively, so here we will quickly go through the experiment.
The data we use today is from the first experiment of Ratcliff and Rounder (1998). This study required subject to decide whether the overall brightness of pixel arrays displayed on a computer monitor was “high” or “low.”In addition, two main manipulations were implemented:
Strength: Proportion of white pixels (i.e., white pixel↑, the strength of brightness ↑)
Instruction: Speed decision (i.e., make decision as fast as possible) or Accurate decision (i.e., make decision as accurate as possible).

# Week 6: Model extension 
# First and Last name
# date
  • To start the new project, clear working environment with rm(list=ls()).
# clear working environment
rm(list=ls())
  • Load the package
# Open "rtdists" package
library(rtdists)
  • Load data
# Load data from the "rtdists" package
data(rr98)
# Double-click rr98 in your Environment panel (right-top window)
# In the dataset, there are 3 participants, who completed ~ 4000 trials in each condition (speed/accuracy).
  • Preprocess data to make sure the data and data structure are suitable for follow-up model fitting.
# Preprocess data
rr98 <- rr98[!rr98$outlier,]  # remove outliers, as in original paper
rr98[,'RT'] <- rr98$rt
rr98[,'accuracy'] = 0 # add a new column labeled as accuracy, where incorrect responses are coded as 0
rr98[rr98$correct, 'accuracy'] = 1 # and correct responses are coded as 1
rr98$choice = rr98$response
rr98$response = as.character(rr98$response)
rr98[rr98$correct == 1, "response"] = "upper"
rr98[rr98$correct == 0, "response"] = "lower"
#rr98[rr98$choice == "light", "response"] = "upper"
#rr98[rr98$choice == "dark", "response"] = "lower"
head(rr98)
  • Check the number of conditions and divide parameter(s) accordingly
#Before we extend the model, we check the name and number of conditions for each manipulations.
# unique() function is useful to remove duplicate elements. 
unique(rr98['instruction']) # two levels
unique(rr98['strength']) # 33 levels

# Should we split parameter(s) into two or 33? (see DDM1-3)
# What should we do if we are interested in strength manipulation? (see DDM4 and DDM5)

6.1 Original DDM

6.1.1 Model

Before model extension, let’s review how the original model looks like and how it is used in model fitting.

# the model we tried in week 4 and week 5
diffusion_deviance2 <- function(pars, dat)
{
  log_lik <- sum(log(ddiffusion(rt=dat[dat["accuracy"] == 1, "rt"],
                                response="upper",  v=pars[1], a=pars[2], t0=pars[3], z=pars[2]/2, s= 1)))
  log_lik <- log_lik +
    sum(log(ddiffusion(rt=dat[dat["accuracy"] == 0, "rt"],
                       response="lower",  v=pars[1], a=pars[2], t0=pars[3], z=pars[2]/2, s= 1)))
  if (any((-2*log_lik == 0))) return(1e6)
  return(-2*log_lik)
}

6.1.2 Fitting

# 1. set up lower and upper boundary for each parameter
lb <-c(-5,0.5,0.1) # Lower Boundary for each parameter. Order is important. Here, drift rate, threshold and non-decision time
ub <-c(5,5,0.5) # Upper Boundary for each parameter. Order is important. Here, drift rate, threshold and non-decision time

# 2. Create optimization function with an additional input: number of repetition.
# (The code is adapted from Mikhail Spektor's).
results<-setNames(data.frame(matrix(ncol = 5, nrow = 0)), c("nRepetition","v", "a","t0 (ms)", "nll"))
run <- 1
optimization_runs <-10

# Run the parameter estimation for several times (this will take few seconds/minutes to complete)
while (run <= optimization_runs){ 
  starting_values <- c(runif(n=1, min=-5,  max=5),
                       runif(n=1, min=0.5,  max=5),
                       runif(n=1, min=0,  max=0.1))
  
  # parameter estimation procedure using nlminb
  fit <- nlminb(start=starting_values, objective=diffusion_deviance2,
                lower = lb, 
                upper= ub, 
                dat=rr98) 
  deviance <- fit[["objective"]]
  params <- fit[["par"]]
  
  # save results for each iterate
  results[run,] <- c(run,
                     round(params[1],2),
                     round(params[2],2),
                     round(params[3]*1000), 
                     deviance)
  run <- run + 1} 

# display results in the Console window
results

6.2 Extension 1 (target drift rate)

6.2.1 Model

Suppose you are familiar with the model, data and experimental design, we can extend the model based on a manipulation of interest. Here we will extend the model by splitting drift rate or v based on instruction manipulations.

# Extend the model by splitting drift rate based on instruction manipulations.
# parameters: v_speed, v_accuracy, a, t0
DDM1test <- function(v_speed,v_accuracy,theshold,ndt, dat)
{
  log_lik <- sum(log(ddiffusion(rt=dat[dat["accuracy"] == 1 & dat["instruction"] == "speed", "rt"],
                                response="upper",  v=v_speed, a=theshold, t0=ndt)))
  log_lik <- log_lik +
    sum(log(ddiffusion(rt=dat[dat["accuracy"] == 1 & dat["instruction"] == "accuracy", "rt"],
                       response="upper",  v=v_accuracy, a=theshold, t0=ndt)))
  log_lik <- log_lik +
    sum(log(ddiffusion(rt=dat[dat["accuracy"] == 0 & dat["instruction"] == "speed", "rt"],
                       response="lower",  v=v_speed, a=theshold, t0=ndt)))
  log_lik <- log_lik +
    sum(log(ddiffusion(rt=dat[dat["accuracy"] == 0 & dat["instruction"] == "accuracy", "rt"],
                       response="lower",  v=v_accuracy, a=theshold, t0=ndt)))
  if (any((-2*log_lik == 0))) return(1e6)
  return(-2*log_lik)
}

A good practice is, grouping the parameters into one variable (e.g., pars). Because now pars summarizes a set of parameters, you should carefully remember and indicate the location of parameters when you are assigning parameters to ddiffusion.

# Extend the model by splitting drift rate (v) based on instruction manipulations.
# parameters: v_speed, v_accuracy, a, t0
# Important: order of parameter matters!! always keep tracking the order of parameter by writing them down as a note.
DDM1 <- function(pars, dat)
{
  log_lik <- sum(log(ddiffusion(rt=dat[dat["accuracy"] == 1 & dat["instruction"] == "speed", "rt"],
                                response="upper",  v=pars[1], a=pars[3], t0=pars[4])))
  log_lik <- log_lik +
    sum(log(ddiffusion(rt=dat[dat["accuracy"] == 1 & dat["instruction"] == "accuracy", "rt"],
                       response="upper",  v=pars[2], a=pars[3], t0=pars[4])))
  log_lik <- log_lik +
    sum(log(ddiffusion(rt=dat[dat["accuracy"] == 0 & dat["instruction"] == "speed", "rt"],
                       response="lower",  v=pars[1], a=pars[3], t0=pars[4])))
  log_lik <- log_lik +
    sum(log(ddiffusion(rt=dat[dat["accuracy"] == 0 & dat["instruction"] == "accuracy", "rt"],
                       response="lower",  v=pars[2], a=pars[3], t0=pars[4])))
  if (any((-2*log_lik == 0))) return(1e6)
  return(-2*log_lik)
}

6.2.2 Fitting

# 1. set up lower and upper boundary for each parameter
# Also, check if the same range is used for the same parameter
lb <-c(-5,-5,0.5,0.1) # Lower Boundary for each parameter. Order is important. Here, drift rate, threshold and non-decision time
ub <-c(5,5,5,0.5) # Upper Boundary for each parameter. Order is important. Here, drift rate, threshold and non-decision time

# 2. Create optimization function with an additional input: number of repetition.
# (The code is adapted from Mikhail Spektor's).
results.DDM1<-setNames(data.frame(matrix(ncol = 6, nrow = 0)), c("nRepetition","vspeed","vaccuracy", "a","t0 (ms)", "nll"))
run <- 1
optimization_runs <-10

# Run the parameter estimation for several times (this will take few seconds/minutes to complete)
while (run <= optimization_runs){ 
  starting_values <- c(runif(n=1, min=-5,  max=5), # selected from the same range for the same parameter
                       runif(n=1, min=-5,  max=5), # selected from the same range for the same parameter
                       runif(n=1, min=0.5,  max=5),
                       runif(n=1, min=0,  max=0.1))
  
  # parameter estimation procedure using nlminb
  fitDDM1 <- nlminb(start=starting_values, objective=DDM1,
                    lower = lb, 
                    upper= ub, 
                    dat=rr98) 
  deviance <- fitDDM1[["objective"]]
  params <- fitDDM1[["par"]]
  
  # save results for each iterate
  results.DDM1[run,] <- c(run,
                          round(params[1],2),
                          round(params[2],2),
                          round(params[3],2),
                          round(params[4]*1000), 
                          deviance)
  run <- run + 1} 

# display results in the Console window
results.DDM1

6.3 Extension 2 (target threshold)

6.3.1 Model

Extend the model by splitting threshold or a based on instruction manipulations.

# Extend the model by splitting threshold based on instruction manipulations
# parameters: v, a_speed, a_accuracy, t0
# Note: order of parameter matters!! always keep tracking the order of parameter.
DDM2 <- function(pars, dat)
{
  log_lik <- sum(log(ddiffusion(rt=dat[dat["accuracy"] == 1 & dat["instruction"] == "speed", "rt"],
                                response="upper",  v=pars[1], a=pars[2], t0=pars[4])))
  log_lik <- log_lik +
    sum(log(ddiffusion(rt=dat[dat["accuracy"] == 1 & dat["instruction"] == "accuracy", "rt"],
                       response="upper",  v=pars[1], a=pars[3], t0=pars[4])))
  log_lik <- log_lik +
    sum(log(ddiffusion(rt=dat[dat["accuracy"] == 0 & dat["instruction"] == "speed", "rt"],
                       response="lower",  v=pars[1], a=pars[2], t0=pars[4])))
  log_lik <- log_lik +
    sum(log(ddiffusion(rt=dat[dat["accuracy"] == 0 & dat["instruction"] == "accuracy", "rt"],
                       response="lower",  v=pars[1], a=pars[3], t0=pars[4])))
  if (any((-2*log_lik == 0))) return(1e6)
  return(-2*log_lik)
}

6.3.2 Fitting

# 1. set up lower and upper boundary for each parameter
# Also, check if the same range is used for the same parameter
lb <-c(-5,0.5,0.5,0.1) # Lower Boundary for each parameter. Order is important. Here, drift rate, threshold and non-decision time
ub <-c(5,5,5,0.5) # Upper Boundary for each parameter. Order is important. Here, drift rate, threshold and non-decision time

# 2. Create optimization function with an additional input: number of repetition.
# (The code is adapted from Mikhail Spektor's).
results.DDM2<-setNames(data.frame(matrix(ncol = 6, nrow = 0)), c("nRepetition","v","a_speed","a_accuracy", "t0 (ms)", "nll"))
run <- 1
optimization_runs <-10

# Run the parameter estimation for several times (this will take few seconds/minutes to complete)
while (run <= optimization_runs){ 
  starting_values <- c(runif(n=1, min=-5,  max=5),
                       runif(n=1, min=0.5,  max=5), # selected from the same range for the same parameter
                       runif(n=1, min=0.5,  max=5), # selected from the same range for the same parameter
                       runif(n=1, min=0,  max=0.1))
  
  # parameter estimation procedure using nlminb
  fitDDM2 <- nlminb(start=starting_values, objective=DDM2,
                    lower = lb, 
                    upper= ub, 
                    dat=rr98) 
  deviance <- fitDDM2[["objective"]]
  params <- fitDDM2[["par"]]
  
  # save results for each iterate
  results.DDM2[run,] <- c(run,
                          round(params[1],2),
                          round(params[2],2),
                          round(params[3],2),
                          round(params[4]*1000), 
                          deviance)
  run <- run + 1} 

# display results in the Console window
results.DDM2

6.4 Exercise: Extension 3 (target t0)

  1. Please modify the original model (DDM, see diffusion_deviance2) and split the non-decision time (t0) based on instruction manipulations.
  2. Run the extended model and see if split t0 can better explain the data. (Hint: compare the nll amount Extension1-3)

6.5 Extension 4 (varing v based on strength)

6.5.1 Model

Another manipulation, strength, might be involved in information process. To test this idea, what can we do? Method1: we assume that drift rate is varied as function of strength:
DriftRate = v_intercept + v_scale(strength).
Method2: Categorize the strength into high and low strength and split parameter accordingly (similar to DDM1-3)
Considering the similarity between Method2 and DDM1-3, we will focus on Method1 here to better understand how strength manipulation is involved in evidence accumulation.
Specifically, we will extend the model by targeting drift rate and implement the hypothetical mechanism: DriftRate = v_intercept + v_scale(strength).

# Extend the model by splitting drift rate based on strength manipulations.
# parameters: v_intercept, v_scale, a, t0
# Note: order of parameter matters!! always keep tracking the order of parameter.
DDM4 <- function(pars, dat)
{
  drift = pars[1] + pars[2]*as.numeric(dat$strength)
  log_lik <- sum(log(ddiffusion(rt=dat$rt,
                                response=dat$response,  
                                v=drift, 
                                a=rep(pars[3],dim(dat)[1]), 
                                t0=rep(pars[4],dim(dat)[1]))))
  if (any((-2*log_lik == 0))) return(1e6)
  return(-2*log_lik)
}

6.5.2 Fitting

# 1. set up lower and upper boundary for each parameter
lb <-c(-5,0.5,0.5,0.05) # Lower Boundary for each parameter. Order is important. Here, drift rate, threshold and non-decision time
ub <-c(5,5,5,0.5) # Upper Boundary for each parameter. Order is important. Here, drift rate, threshold and non-decision time


# 2. Create optimization function with an additional input: number of repetition.
results.DDM4<-setNames(data.frame(matrix(ncol = 6, nrow = 0)), c("nRepetition","v_intercept","v_scale", "a","t0(ms)", "nll"))

# 3. Initialization
run <- 1
optimization_runs <- 5

# 4. run the main script 
while (run <= optimization_runs){ 
  starting_values <- c(runif(n=1, min=-5,  max=5), 
                       runif(n=1, min=-5,  max=5), 
                       runif(n=1, min=0.5,  max=5),
                       runif(n=1, min=0.05,  max=0.1))
  
  # check if starting value makes sense or not. If not, resample the starting values
  # See Section: Be sure that your initial conditions give finite log-likelihoods (Whilson et al., 2019.)
  while (DDM4(starting_values,rr98) == Inf) {starting_values <- c(runif(n=1, min=-10,  max=0), 
                                                                  runif(n=1, min= 0,  max=10), 
                                                                  runif(n=1, min=0.5,  max=4),
                                                                  runif(n=1, min=0,  max=0.1))}
  
  # parameter estimation procedure using nlminb
  fitDDM4 <- nlminb(start=starting_values, objective=DDM4,
                    lower = lb, 
                    upper= ub, 
                    dat=rr98) 
  deviance <- fitDDM4[["objective"]]
  params <- fitDDM4[["par"]]
  
  # save results for each iterate
  results.DDM4[run,] <- c(run,
                          round(params[1],2),
                          round(params[2],2),
                          round(params[3],2),
                          round(params[4]*1000), 
                          deviance)
  run <- run + 1}

# display results in the Console window
results.DDM4

6.6 Exercise: Extension 5

  1. Please extend the DDM4 by splitting the non-decision time (t0) based on the instruction manipulation.
  2. Run the extended model and see if split t0 can better explain the data. (Hint: compare the nll amount Extension1-5)

6.7 Parameter estimation (subject-based)

Now, let’s run the same models but estimate the parameters for each subject, separately. ### original model (subject-based)

# Initialization
lb <-c(-5,0.5,0.05) # Lower Boundary for each parameter. Order is important. Here, drift rate, threshold and non-decision time
ub <-c(5,5,0.5) # Upper Boundary for each parameter. Order is important. Here, drift rate, threshold and non-decision time
n<-0

# Pre-allocation (i.e., create an empty data frame and name each column)
results.ppDDM0<-setNames(data.frame(matrix(ncol = 5, nrow = 0)), c("name", "v", "a","t0 (ms)", "nll"))

# Main for loop to estimate parameters for each subject
for (pp in unique(rr98$id)) {
  print(pp) # display the current subject in the Console window, so that you know the current progress
  
  data_pp = rr98[rr98$id == pp,]
  
  starting_values <- c(runif(n=1, min=-5,  max=5), 
                       runif(n=1, min=0.5,  max=5),
                       runif(n=1, min=0.05,  max=0.1))
  
  # parameter estimation procedure using nlminb
  fit <- nlminb(start=starting_values, 
                objective=diffusion_deviance2,
                lower=lb,# the lower boundary for each parameter. 
                upper=ub, # the upper boundary for each parameter.
                dat=data_pp)
  deviance <- fit[["objective"]]
  params <- fit[["par"]]
  
  # store result for the first to the last subject
  n <- n+1 #update the current number of subject
  results.ppDDM0[n,] <- c(pp,                      #subject's name
                          round(params[1],2),      #drift rate
                          round(params[2],2),      #threshold 
                          round(params[3]*1000),   #non-decision time (ms)
                          deviance)                #negative log likelihood
}
results.ppDDM0

6.7.1 DDM1 (subject-based)

# Initialization
lb <-c(-5,-5,0.5,0.05) # Lower Boundary for each parameter. Order is important. Here, drift rate, threshold and non-decision time
ub <-c(5,5,5,0.5) # Upper Boundary for each parameter. Order is important. Here, drift rate, threshold and non-decision time
n<-0

# Pre-allocation (i.e., create an empty data frame and name each column)
results.ppDDM1<-setNames(data.frame(matrix(ncol = 6, nrow = 0)), c("name", "v_speed","v_accuracy", "a","t0 (ms)", "nll"))

# Main for loop to estimate parameters for each subject
for (pp in unique(rr98$id)) {
  print(pp) # display the current subject in the Console window, so that you know the current progress
  
  data_pp = rr98[rr98$id == pp,]
  
  starting_values <- c(runif(n=1, min=-5,  max=5), 
                       runif(n=1, min=-5,  max=5), 
                       runif(n=1, min=0.5,  max=5),
                       runif(n=1, min=0.05,  max=0.1))
  
  # parameter estimation procedure using nlminb
  fit <- nlminb(start=starting_values, 
                objective=DDM1,
                lower=lb,# the lower boundary for each parameter. 
                upper=ub, # the upper boundary for each parameter.
                dat=data_pp)
  deviance <- fit[["objective"]]
  params <- fit[["par"]]
  
  # store result for the first to the last subject
  n <- n+1 #update the current number of subject
  results.ppDDM1[n,] <- c(pp,                      #subject's name
                          round(params[1],2),      #drift rate
                          round(params[2],2),      #drift rate
                          round(params[3],2),      #threshold 
                          round(params[4]*1000),   #non-decision time (ms)
                          deviance)                #negative log likelihood
}
results.ppDDM1

6.7.2 DDM2 (subject-based)

# Initialization
lb <-c(-5,0.5,0.5,0.05) # Lower Boundary for each parameter. Order is important. Here, drift rate, threshold and non-decision time
ub <-c(5,5,5,0.5) # Upper Boundary for each parameter. Order is important. Here, drift rate, threshold and non-decision time
n<-0

# Pre-allocation (i.e., create an empty data frame and name each column)
results.ppDDM2<-setNames(data.frame(matrix(ncol = 6, nrow = 0)), c("name", "v", "a_speed","a_accuracy","t0 (ms)", "nll"))

# Main for loop to estimate parameters for each subject
for (pp in unique(rr98$id)) {
  print(pp) # display the current subject in the Console window, so that you know the current progress
  
  data_pp = rr98[rr98$id == pp,]
  
  starting_values <- c(runif(n=1, min=-5,  max=5), 
                       runif(n=1, min=0.5,  max=5),
                       runif(n=1, min=0.5,  max=5),
                       runif(n=1, min=0.05,  max=0.1))
  
  # parameter estimation procedure using nlminb
  fit <- nlminb(start=starting_values, 
                objective=DDM2,
                lower=lb,# the lower boundary for each parameter. 
                upper=ub, # the upper boundary for each parameter.
                dat=data_pp)
  deviance <- fit[["objective"]]
  params <- fit[["par"]]
  
  # store result for the first to the last subject
  n <- n+1 #update the current number of subject
  results.ppDDM2[n,] <- c(pp,                      #subject's name
                          round(params[1],2),      #drift rate
                          round(params[2],2),      #threshold 
                          round(params[3],2),      #threshold 
                          round(params[4]*1000),   #non-decision time (ms)
                          deviance)                #negative log likelihood
}
results.ppDDM2

6.7.3 DDM4 (subject-based)

# Initialization
lb <-c(-10,-10,0.5,0.05) # Lower Boundary for each parameter. Order is important. Here, drift rate, threshold and non-decision time
ub <-c(1,10,10,3) # Upper Boundary for each parameter. Order is important. Here, drift rate, threshold and non-decision time
n<-0

# Pre-allocation (i.e., create an empty data frame and name each column)
results.ppDDM4<-setNames(data.frame(matrix(ncol = 6, nrow = 0)), c("name", "v_intercept","v_scale", "a","t0(ms)", "nll"))

# Main for loop to estimate parameters for each subject
for (pp in unique(rr98$id)) {
  data_pp = rr98[rr98$id == pp,]
  
  starting_values <- c(runif(n=1, min=-10,  max=0), 
                       runif(n=1, min= 0,  max=10), 
                       runif(n=1, min=2,  max=4),
                       runif(n=1, min=0,  max=0.1))
  
  # check if starting value makes sense or not. If not, resample the starting values
  while (DDM4(starting_values,data_pp) == Inf) {starting_values <- c(runif(n=1, min=-10,  max=0), 
                                                                  runif(n=1, min= 0,  max=10), 
                                                                  runif(n=1, min=0.5,  max=4),
                                                                  runif(n=1, min=0,  max=0.1))}
  print(pp) # display the current subject in the Console window, so that you know the current progress
  
  # parameter estimation procedure using nlminb
  fit <- nlminb(start=starting_values, 
                objective=DDM4,
                lower=lb,# the lower boundary for each parameter. 
                upper=ub, # the upper boundary for each parameter.
                dat=data_pp)

  deviance <- fit[["objective"]]
  params <- fit[["par"]]
  
  # store result for the first to the last subject
  n <- n+1 #update the current number of subject
  results.ppDDM4[n,] <- c(pp,               #subject's name
                          round(params[1],2),      #drift rate: intercept
                          params[2],      #drift rate: scale
                          round(params[3],2),      #threshold 
                          round(params[4]*1000),   #non-decision time (ms)
                          deviance)                #negative log likelihood
}
results.ppDDM4

6.8 Answers

Will be released before next meeting. ### Extension3

6.8.1 Fitting

6.8.2 Extension5

6.8.3 Fitting