# Chapter 2 EMC^{2} Design Overview

## 2.1 Overview of experimental design

This chapter explains how to specify models in EMC^{2}. In later chapters, we’ll show you how to use EMC^{2} to model data and answer psychological questions. You can see a glossary of EMC^{2} functions later in the book, which you should refer back to as needed.

EMC^{2} uses a formula-based method to specify models, similar to how regression models are specified in R packages such as `lme4`

. We will introduce the concept of a design matrix to illustrate how EMC^{2} model formulas get converted into design matrices, which map a model’s parameters to its likelihood function.

Model formulas (and design matrices) specify how a model’s parameters relate to conditions in an experiment. For example, if our experiment contains both easy- and hard-to-discriminate stimuli, we may wish to allow the drift rate parameter to vary by stimulus type. Similarly, if our experiment involves speed vs. accuracy instructions, we can specify a formula that allows thresholds to vary between the different instruction conditions.

### 2.1.1 Why Mapping

Of key importance in fitting a model to data is that we create a function that takes proposed parameter values as input, and returns the likelihood of those parameters for a given model and set of data. This is called a likelihood function.

In relating a likelihood function to an experimental design, we need to be able to tell the model which parameters to map to which conditions or cells of our design. This can get quite complex in designs with many conditions and models with many parameters.

A simple way to do this would be to allow (some) parameters to vary freely over every cell of the design. However, this probably will not provide the most concise or parsimonious account of our data. It also means that our parameter space may grow quite large very quickly.

Instead, we can use contrasts to directly estimate parameters for the effect(s) we are interested in (e.g., testing the difference between two conditions). To achieve this, we estimate an “intercept” parameter and then estimate parameters that represent deviation from the intercept in the direction of the effect(s) we wish to measure. EMC^{2} includes several functions that greatly simplify mapping model parameters to data, and which allow users to see the results of mappings and to make direct inferences on target effects and interactions.

### 2.1.2 Typical analyses

In experimental psychology, we typically want to investigate the effects of our experimental manipulations. Let’s imagine that we have a simple perceptual decision-making task with two conditions. In one condition, participants are instructed to respond as quickly as possible (a speed condition) and, in the other, they’re told to respond as accurately as possible (an accuracy condition).

This type of speed-accuracy trade-off (SAT) experiment is very common in cognitive psychology and neuroscience, and has become somewhat of a benchmark test of evidence accumulation models. The effect of interest relates to how participants control their decision strategies in order to respond more quickly or more accurately.

Responding more quickly typically reduces accuracy, and responding more accurately typically makes reduces speed (i.e., makes responses slower). Most SAT studies explain this effect in terms of the threshold parameter: Participants set lower thresholds (i.e., require less evidence to trigger a response) in speed conditions and higher thresholds (i.e., require more evidence to trigger a response) in accuracy conditions (Heitz 2014; Rae et al. 2014).

This design requires mapping the different sets of speed and accuracy trials to the corresponding threshold parameters in our model. We could do this using a `for`

loop, but this is slow and gets very complicated when more conditions and parameters are added. Instead, EMC^{2}’s formula method gives our likelihood function a vector of responses that maps each response to the appropriate parameter.

### 2.1.3 Coding

Before we introduce specific functions, it is important to discuss the type of parameter coding. There are two types of parameter coding that we will refer to; **Effect Coding** and **Cell Coding**.

#### 2.1.3.1 Effect Coding

Effect coding is used when we want to measure the effect of a factor. This is similar to regression/ANOVA designs where we test for main effects and interactions. For the SAT example above, we might expect this to impact the threshold parameter `b`

, and so in the model we would specify an intercept `b`

and a `b.acc`

effect parameter, which is the difference parameter between the conditions. Effect coding means that we estimate fewer parameters for each effect (i.e., we parameterise the differences between cells instead of parameterising each cell separately). Most importantly, this lets us quantitatively test for the effect of a factor. For example, if `b.acc = .5`

with credible interval of `.3-.6`

then we can conclude that an effect is present and that the direction of the effect is in the expected direction (i.e., accuracy > speed for the threshold parameter `b`

).

#### 2.1.3.2 Cell Coding

Cell coding is another common approach in which each cell of the design is assigned its own parameter. This means that main effects and interactions need to be calculated post-hoc and interactions are always inferred. For example, in an SAT design with hard and easy difficulty, we might have the parameters `hard.speed`

, `easy.speed`

, `hard.accuracy`

, and `easy.accuracy`

. This method means there are more parameters to estimate and less information per parameter. In EMC^{2}, it is possible to use the effect coding and then find the implied cell coding. We will demonstrate how to do this below.

### 2.1.4 Data

The shape of your data is important for creating the design matrix. The data should be in long format (i.e., each trial/observation corresponds to one row in the data frame). You can use `tidyr`

’s `pivot_longer`

function to change data from wide to long format).

The data should have a column called `subjects`

containing participant numbers/identifiers, and columns called `S`

, `R`

, and `rt`

containing the stimuli, observed responses, and response times (in seconds), respectively.

Any experimental conditions/factors in your design should also have separate columns (e.g., a `difficulty`

column with levels `easy`

and `hard`

; an `emphasis`

column with levels `speed`

and `accuracy`

).

This means that each row of the data will include the subject, stimulus, response, response time, and the particular condition/s for that trial. The upcoming chapters will show examples of data in this format.

In formatting your data, there are several terms that EMC^{2} requires you to either use specifically (or to avoid as they are reserved for internal functions). The column containing participant identifiers must be named `subjects`

and the column containing response times must be named `rt`

. The user can specify their own names for the other columns. However, avoid using the labels `lM`

(latent match), `lR`

(latent response), `S`

(stimulus) and `R`

(response) unless they specifically relate to that aspect of the data (e.g., do not use `S`

to describe an experimental manipulation of “sound”). These are reserved names that EMC^{2} uses ‘under the hood’.

It is highly recommended that your response times are in seconds (s). Throughout this document, we exclusively refer to RT in seconds. Milliseconds could be used, however, changes may need to be made to some of the examples shown to ensure the correct scale. You can convert millisecond RTs (e.g., 200ms) to seconds (e.g., 0.2s) by dividing the `rt`

vector by 1000.

In general, it is good practice to use consistent variable naming and scales throughout the model coding, figure plotting, and report writing processes.

### 2.1.5 Working with the design (exercises)

In this section, we introduce some exercises to familiarize you with EMC^{2}’s model design functions. The full list of functions can be seen in the EMC^{2} Functions chapter, particularly the Design functions. For this exercise, we’ll use the DDM model, which may be familiar to anyone who has heard of evidence accumulation models. Later in the textbook, we’ll go through a full example of fitting the DDM and changing parameterisations, but for now, let’s focus on setting up a model design and understanding what this means. Let’s first set up our environment by loading some data and the functions;

```
rm(list=ls())
source("emc/emc.R")
source("models/DDM/DDM/ddmTZD.R")
#### Format the data to be analyzed ----
print(load("Data/PNAS.RData"))
# Note that this data was censored at 0.25s and 1.5s
<- data[,c("s","E","S","R","RT")]
dat names(dat)[c(1,5)] <- c("subjects","rt")
levels(dat$R) <- levels(dat$S)
```

For the standard examples, we use Effect Coding. A model design will typically look like this (for now, don’t worry about the matrices we first create here);

```
<- matrix(c(0,-1,0,0,0,-1),nrow=3)
Emat dimnames(Emat) <- list(NULL,c("a-n","a-s"))
<- matrix(c(-1,1),ncol=1,dimnames=list(NULL,""))
Vmat
<- make_design(
design_a Ffactors=list(subjects=levels(dat$subjects),S=levels(dat$S),E=levels(dat$E)),
Rlevels=levels(dat$R),
Clist=list(S=Vmat,E=Emat),
Flist=list(v~S,a~E,sv~1, t0~1, st0~1, s~1, Z~1, SZ~1, DP~1),
constants=c(s=log(1),st0=log(0),DP=qnorm(0.5),SZ=qnorm(0),sv=log(0)),
model=ddmTZD)
```

For this design, let’s first focus on the `Flist`

argument. This is the main argument for building our parameters. In `Flist`

, we need to specify which parameters are free to vary and what they are free to vary across. By “free to vary”, we mean that they have a different parameter for conditions. In this example, you will notice that the parameter `a`

(which corresponds to boundary separation in the DDM) varies with the `E`

(emphasis) parameter. Here, we are saying that we want to estimate whether `a`

is affected by `E`

.

Secondly, you will notice that other parameters (`t0`

for example) are shown as `t0~1`

. This means that `t0`

does not vary across conditions, so we only estimate one of these. To make this clear, let’s look at the output of this design. When we look at the output, this will list the sampled vector of parameters first and then will show the list of contrasts used for each parameter.

`sampled_p_vector(design_a)`

```
## v v_S a a_Ea-n a_Ea-s t0 Z
## 0 0 0 0 0 0 0
## attr(,"map")
## attr(,"map")$v
## v v_S
## 1 1 -1
## 20 1 1
##
## attr(,"map")$a
## a a_Ea-n a_Ea-s
## 1 1 0 0
## 39 1 -1 0
## 77 1 0 -1
##
## attr(,"map")$sv
## sv
## 1 1
##
## attr(,"map")$t0
## t0
## 1 1
##
## attr(,"map")$st0
## st0
## 1 1
##
## attr(,"map")$s
## s
## 1 1
##
## attr(,"map")$Z
## Z
## 1 1
##
## attr(,"map")$SZ
## SZ
## 1 1
##
## attr(,"map")$DP
## DP
## 1 1
```

Here, we can see the parameters to be estimated. These include single parameters for each where we used `~1`

. In addition to this, we can check the `v`

and `a`

parameters, which were set up to vary across specific conditions.

First, for `a`

we see 3 parameters; `a`

, `a_Ea-n`

and `a_Ea-s`

. Because we used effect coding, these relate to the intercept term (here this is `a`

) and the effect terms. For the effect terms, these are given as `Ea-n`

(accuracy emphasis minus neutral emphasis) and `Ea-s`

(accuracy emphasis minus speed emphasis). Using these codings allows us to estimate parameters that directly answer our research questions. In this case, we would want to know whether boundary separation is greater for accuracy conditions compared to the other conditions.

Further, using this coding, we can essentially “reconstruct” any cell-specific parameter of interest (for example, we can construct the “accuracy `a`

parameter). This is how the model uses these parameters, and we can see this in the contrast matrices. For example, when the condition is”accuracy” then the boundary separation is given by `a`

. When the condition is “neutral”, then the condition is given by `a`

- `a_Ea-n`

. The contrasts allow us to specify many different types of factors and relationships, and we will show many different examples of these throughout.

For the contrast matrices we constructed above;

```
<- matrix(c(0,-1,0,0,0,-1),nrow=3)
Emat dimnames(Emat) <- list(NULL,c("a-n","a-s"))
<- matrix(c(-1,1),ncol=1,dimnames=list(NULL,"")) Vmat
```

You can see that Emat is our “emphasis” contrast matrix, which we specify in the `Clist`

argument. Here, EMC^{2} will automatically create contrasts if they are not given for each factor, but for us, we wanted to specify these custom contrasts. By factor, we mean the “emphasis” manipulation (E). Secondly, we can see the `Vmat`

contrast matrix. This one is quite straightforward, and we can see that in the parameter output above. For `v`

we say that `v`

varies with `S`

- the stimuli. For this in the contrast matrix, we can see that the `v_S`

effect is either added or subtracted dependent on the stimuli. Given that the parameter can be estimated as positive or negative, we are then able to see the effect of stimuli, to see which has a higher `v`

, but further, by assessing the `v_S`

parameter, we can immediately test the difference between stimuli. For example, is `v_S = 0`

then there is no difference!

We’ll go through several more design examples in following chapters, however, for now, it’s good to familiarize yourself with the arguments used in regards to factors and contrasts. The other arguments are pretty straightforward (model, data, constants, and factors to specify), but will be reviewed in later chapters. Let’s look at one more example of the design;

```
<- make_design(
design_at0E_full Ffactors=list(subjects=levels(dat$subjects),S=levels(dat$S),E=levels(dat$E)),
Rlevels=levels(dat$R),
Flist=list(v~S*E,a~E,sv~1, t0~E, st0~1, s~1, Z~1, SZ~1, DP~1),
constants=c(s=log(1),DP=qnorm(0.5)),
model=ddmTZD)
```

For this example, first, let’s look at the constants. Constants are parameters which are NOT estimated during sampling. They are kept constant. Many models (and all EMC^{2} models) include constants for scaling reasons, because if something is not set, the model essentially has no reference point.

Secondly, in this example we can see that we include an effect on `t0`

, but more interesting, we include two effects for `v`

; `v~S*E`

. This `*`

infers an interaction term. When we view the parameter output, we can see this;

`sampled_p_vector(design_at0E_full)`

```
## v v_Sright v_Eneutral v_Espeed
## 0 0 0 0
## v_Sright:Eneutral v_Sright:Espeed a a_Eneutral
## 0 0 0 0
## a_Espeed sv t0 t0_Eneutral
## 0 0 0 0
## t0_Espeed st0 Z SZ
## 0 0 0 0
## attr(,"map")
## attr(,"map")$v
## v v_Sright v_Eneutral v_Espeed v_Sright:Eneutral v_Sright:Espeed
## 1 1 0 0 0 0 0
## 39 1 0 1 0 0 0
## 77 1 0 0 1 0 0
## 20 1 1 0 0 0 0
## 58 1 1 1 0 1 0
## 96 1 1 0 1 0 1
##
## attr(,"map")$a
## a a_Eneutral a_Espeed
## 1 1 0 0
## 39 1 1 0
## 77 1 0 1
##
## attr(,"map")$sv
## sv
## 1 1
##
## attr(,"map")$t0
## t0 t0_Eneutral t0_Espeed
## 1 1 0 0
## 39 1 1 0
## 77 1 0 1
##
## attr(,"map")$st0
## st0
## 1 1
##
## attr(,"map")$s
## s
## 1 1
##
## attr(,"map")$Z
## Z
## 1 1
##
## attr(,"map")$SZ
## SZ
## 1 1
##
## attr(,"map")$DP
## DP
## 1 1
```

For the `v`

related parameters, we now have `v`

, `v_Sright`

, `v_Eneutral`

, `v_Espeed`

, `v_Sright:Eneutral`

, `v_Sright:Espeed`

. These indicate the intercept term; `v`

, the main effect terms; `v_Sright`

for `S`

(meaning left is the comparison, i.e., the difference between right and left) and `v_Eneutral`

& `v_Espeed`

for `E`

(meaning accuracy is again our comparison). We also get `v_Sright:Eneutral`

and `v_Sright:Espeed`

that represent the interaction terms between the three main effect parameters.

If we only wanted main effects, without interaction effects, we would specify `v~S+E`

. Let’s look at what the output of this would be;

```
<- make_design(
design_at0E_full Ffactors=list(subjects=levels(dat$subjects),S=levels(dat$S),E=levels(dat$E)),
Rlevels=levels(dat$R),
Flist=list(v~S+E,a~E,sv~1, t0~E, st0~1, s~1, Z~1, SZ~1, DP~1),
constants=c(s=log(1),DP=qnorm(0.5)),
model=ddmTZD)
```

```
## v v_Sright v_Eneutral v_Espeed a a_Eneutral a_Espeed
## 0 0 0 0 0 0 0
## sv t0 t0_Eneutral t0_Espeed st0 Z SZ
## 0 0 0 0 0 0 0
## attr(,"map")
## attr(,"map")$v
## v v_Sright v_Eneutral v_Espeed
## 1 1 0 0 0
## 39 1 0 1 0
## 77 1 0 0 1
## 20 1 1 0 0
## 58 1 1 1 0
## 96 1 1 0 1
##
## attr(,"map")$a
## a a_Eneutral a_Espeed
## 1 1 0 0
## 39 1 1 0
## 77 1 0 1
##
## attr(,"map")$sv
## sv
## 1 1
##
## attr(,"map")$t0
## t0 t0_Eneutral t0_Espeed
## 1 1 0 0
## 39 1 1 0
## 77 1 0 1
##
## attr(,"map")$st0
## st0
## 1 1
##
## attr(,"map")$s
## s
## 1 1
##
## attr(,"map")$Z
## Z
## 1 1
##
## attr(,"map")$SZ
## SZ
## 1 1
##
## attr(,"map")$DP
## DP
## 1 1
```

Now we see no interaction terms, only main effects. Next, let’s look at cell coding.

### 2.1.6 Cell coding example

As outlined above, cell coding gives us one parameter for each cell of the design. This means that rather than estimating an intercept and effect term, we instead estimate one parameter for the condition. Let’s see what this would look like. In the example below, we only use cell coding for the `v`

parameter. To use cell coding, we need to make use of the Ffunctions argument, which specifies a function to perform on the factors. You can see below that our `se()`

function takes in the `S`

and `E`

factors and essentially combines these into all possible combinations (6 combinations).

```
<- function(d) {factor(paste(d$S,d$E,sep="_"),levels=
se c("left_accuracy","right_accuracy","left_neutral","right_neutral","left_speed","right_speed"))}
<- make_design(
design_avt0_full Ffactors=list(subjects=levels(dat$subjects),S=levels(dat$S),E=levels(dat$E)),
Rlevels=levels(dat$R),
Clist=list(SE=diag(6)),
Flist=list(v~0+SE,a~E,sv~1, t0~E, st0~1, s~1, Z~1, SZ~1, DP~1),
Ffunctions = list(SE=se),
constants=c(s=log(1),DP=qnorm(0.5)),
model=ddmTZD)
```

```
## v_SEleft_accuracy v_SEright_accuracy v_SEleft_neutral v_SEright_neutral
## 0 0 0 0
## v_SEleft_speed v_SEright_speed a a_Eneutral
## 0 0 0 0
## a_Espeed sv t0 t0_Eneutral
## 0 0 0 0
## t0_Espeed st0 Z SZ
## 0 0 0 0
## attr(,"map")
## attr(,"map")$v
## v_SEleft_accuracy v_SEright_accuracy v_SEleft_neutral v_SEright_neutral
## 1 1 0 0 0
## 39 0 0 1 0
## 77 0 0 0 0
## 20 0 1 0 0
## 58 0 0 0 1
## 96 0 0 0 0
## v_SEleft_speed v_SEright_speed
## 1 0 0
## 39 0 0
## 77 1 0
## 20 0 0
## 58 0 0
## 96 0 1
##
## attr(,"map")$a
## a a_Eneutral a_Espeed
## 1 1 0 0
## 39 1 1 0
## 77 1 0 1
##
## attr(,"map")$sv
## sv
## 1 1
##
## attr(,"map")$t0
## t0 t0_Eneutral t0_Espeed
## 1 1 0 0
## 39 1 1 0
## 77 1 0 1
##
## attr(,"map")$st0
## st0
## 1 1
##
## attr(,"map")$s
## s
## 1 1
##
## attr(,"map")$Z
## Z
## 1 1
##
## attr(,"map")$SZ
## SZ
## 1 1
##
## attr(,"map")$DP
## DP
## 1 1
```

Secondly, you will notice that for our contrasts, we use `SE=diag(6)`

. This is because the contrast matrix is now a diagonal matrix, where only the indexed parameter is estimated for the given condition; i.e., when `stim = right`

and `emphasis = accuracy`

we only need to estimate the `right_accuracy`

parameter (as opposed to an intercept parameter as well).

Finally, importantly for cell coding, you will notice that `v=0+SE`

. For `v=0`

this means that the `v`

intercept parameter is not estimated in the model. When looking at the parameter output, this is what we need for cell coding, as estimating a `v`

intercept parameter would not make sense in the cell coding schema.

## 2.2 Group level model

EMC^{2} uses the PMwG sampler from Gunawan et al. (2020). PMwG uses a hierarchical Bayesian sampling scheme, which means that we obtain samples for both subject- or individual-level parameters, and population- or group-level parameters. Due to this hierarchical structure, we need priors for both the subject-level and the group-level parameters. Here, the prior for the subject-level model takes the form of a multivariate normal (MVN) distribution. This MVN is the group-level model. For the prior on the group-level MVN, we use the prior recommended by Gunanwan et al. (2020) (discussed below).

Specifying an MVN for the group-level model means that we are assuming that each subject-level parameters is normally distributed across subjects, and that subject-level parameters can be correlated with each other. Generally, this type of model can be tricky to estimate, however is a strong advantage of PMwG, as it allows us to capture the covariance between parameters. For models with highly correlated parameters (as EAMs often do), this is a key advantage. This also allows us to estimate the correlations between different parameters. EMC^{2} thus enables inference on group-level parameters (i.e., group means), and allows us to assess within-parameter variances (i.e., the certainty of parameter estimates) and to make inferences about the correlations between parameters.

### 2.2.1 Priors

In Bayesian modelling, priors play an important role in incorporating prior information into our parameter estimates. When sampling a model, the prior helps restrict the sampling space so that implausible or unsupported values are not sampled. For example, in an EAM, `t0`

cannot take on negative values, since it is measuring a length of time. In DMC, we would often set a truncated normal prior on the distribution of `t0`

to ensure that only positive values were sampled. PMwG however, includes more general group level priors.

### 2.2.2 PMwG Priors

The priors for PMwG are best thought of at three levels. At the base or first level, we have subjects’ individual-level parameters. The prior for these subject-level parameters is the second level, that is group-level parameters. Finally, the third level is the prior on the group-level parameters, also known as the hyper-prior. We can view this hierarchy from the top-down, whereby the hyper-prior influences the group-level model, and the group-level model constrains the subject-level estimates.

For the group level model, the *prior* on the subjects is an MVN. For the prior on this MVN, we need a prior on Mu and Sigma. For Mu we use another MVN, with mean zero and the covariance matrix as an identity matrix. The mean is the default prior given by `prior$theta_mu_mean`

, however, you are able to specify your own prior Theta Mu by including this in the sampler object. Similarly, the Theta covariance prior identity matrix can be found in `prior$theta_mu_var`

, and can be changed to increase the variance (diagonal) if needed (e.g., for parameters with different ranges of values).

For the prior on the variance-covariance matrix of the MVN, we use Huang and Wand’s (2013) prior. This prior is a mix of inverse Wishart distributions, with mixture weights according to an inverse Gaussian distribution. This prior leads to marginally non-informative (close to uniform) priors on the correlations, and half-*t* distributed priors on the standard deviations (variances). The default settings for these priors are hard-coded in the sampler.

You may have noticed that the default mean for the group level MVN is 0. This is because PMwG samples on the real line, such that any value from -Inf to Inf are possible. Going back to our `t0`

example - this is problematic as we could sample negative `t0`

values. To overcome this, we use transformations on the parameters or protections against certain values.

In doing a transformation on a parameter, we transform the values from the *sampled* scale to the *natural* scale. Here, the natural scale is the scale of the parameter used by the model, whereas the sampled scale is the actual values sampled in the algorithm. This is important for interpreting the parameters (and correlations), and EMC^{2} includes functionality to view parameters on either scale.

One example of a transformation is the log scale. Here, we would take a parameter (such as `t0`

) that is strictly positive, and we take the exponent of the sampled value before using this transformed value for the likelihood calculation, thereby ensuring that the model uses only positive values. Another transformation is the probit transform, where we obtain values between 0 and 1.

All models in EMC^{2} include their own parameter transformations within the model file, so it is taken care of for you. However, it is good to be aware of the scale used for any given model parameters for sampling. This is both for interpretation and model estimation. For interpretation, the reasoning is straightforward - we want to report the values that the model is using (the natural scale), so that we don’t report negative values of `t0`

.

For model estimation, we may encounter difficulties when a parameter gets stuck in a bad region of parameter space, especially when the transformation results in a very similar outcome. For example, on the log scale, the values of -6 and -1000 are actually pretty close, whereas on the sampled scale this is not the case. Yet both would have similar support in the model. EMC^{2} includes several protections to avoid these problems, but it is always worthwhile checking the sampling process by evaluating parameter chains.

#### 2.2.2.1 Protections

EMC^{2} also includes a number of protections against certain values being used by models. For example, with `t0`

, we often specify that values of `t0`

larger than the smallest RT are given an extremely small likelihood, meaning that these values are not accepted in the sampling process. This can be problematic if we include very small response times, so we may want to first remove some of these. For this, we also have solutions, although these will be introduced later when the functionality is stable.

### 2.2.3 Sampling

In this section, we explain how PMwG sampling works in relation to EMC^{2} arguments. PMwG uses likelihood calculation to find the posterior mode for a given model and data. On each iteration, the likelihood of a number of *particles* is estimated for each subject, and compared to the previous particle. Each particle is a vector of the parameter values that is used in the model.

The best of these particles, i.e., particle with the highest likelihood, is then selected and retained for the next iteration. This is the basic idea of the particle metropolis step of PMwG. For each particle we find the likelihood of the particle given the data, and also the likelihood of that particle given the group level distribution, such that the particle that is accepted has the highest marginal weight.

When proposing particles, there are three multivariate normal distributions that particles can be drawn from. The first is the group level distribution, which has group level mean and group level variance. The second is the individual distribution, which has the mean of the previous particle for the individual and group level variance scaled by a parameter epsilon. The final distribution is the efficient proposal distribution. This distribution is unique for each participant, and is a conditional multivariate normal distribution, conditioned on the previous particles. This proposal distribution is essentially a “posterior” proposal distribution, meaning that particles are more likely to be accepted.

In PMwG, the acceptance rate refers to the rate at which “new” particles are selected (where the “old” particle refers to the previous particle). For reasons related to efficiency and sampling behaviour, we do not aim for a 100% acceptance rate. Instead, it is common to aim for between 20 and 80% acceptance (although this can depend on the model you are using). EMC^{2} includes functionality that automatically aims for a good acceptance rate. This is difficult, since acceptance can differ across subjects, so for each subject, we scale the epsilon parameter to make the variance larger when acceptance is too high, or smaller when the variance is too low. This is the `p_star`

argument, where we also set a maximum epsilon value to restrict the range and prevent computational errors.

For group level sampling, we use a Gibbs step to estimate a multivariate normal group level model (in the “standard” approach). This means that on each iteration, we estimate the mean, given the group variance, then sample the variance given mu and mixing weights, and then sample the mixing weights given mu and the variance. However, as noted below, we can estimate other group level models aside from the standard multivariate normal.

#### 2.2.3.1 Stages

PMwG works in three stages. The first is *burn-in*, where the sampler aims to find the posterior mode, sampling particles from the group level and individual proposal distributions. The second stage is *adaptation*, where sampling continues. However, after several iterations an attempt is made to create the conditional efficient distribution, conditioned on the previous particles sampled in adaptation (from the posterior). The final stage is *sampling*, where, along with the two previous proposal distributions, particles are sampled from the efficient distribution to achieve accurate and efficient sampling. We check all stages of sampling, but tend to make inference on the sampled stage.

#### 2.2.3.2 Chains

In EMC^{2}, we use multiple chains (default = 3) to ensure accurate posterior sampling. Essentially, in EMC^{2}, chains are separate runs of PMwG, however, chains will interact when they get stuck in bad regions after some time. To ensure we get accurate posterior sampling, we want to see convergence of chains across parameters (and likelihood estimates) for subject and group level estimates as a reliability test. There are instances where the sampler can get “stuck”, so chains help us avoid making inference on samples which haven’t reached the posterior.

#### 2.2.3.3 Convergence

Convergence relates to the chains (hopefully) reaching the same posterior mode. To check this, we compute the Gelman-diag statistic, which calculates the variance within and between chains, where a value of 1 is optimal. Our default setting for all chains is below 1.2 for our automatized functions, such as `auto_burn`

. This function will sample until a criterion GD is reached, meaning you don’t have to worry about how many iterations you do for burn-in. However, it may help to set a maximum, or minimum, or obtain extra samples in in the burn-in stage, depending on your aim. For example, you may want to first test whether your model even runs, so you could set the number of iterations to 10 to check that you don’t get an error.

### 2.2.4 Shrinkage

Shrinkage is a feature of hierarchical models, whereby individual parameter estimates are constrained by the group level. This means that extreme estimates for individual subjects are “shrunk” towards the group level distribution mean. PMwG has some elements that impose shrinkage, and others that help to ensure that shrinkage does not greatly affect subject level estimates. As particles are evaluated relative to the group level model (i.e. P(particle|group)), particles within the group distribution will have a higher likelihood of being selected. However, as individual particles can come from both the group and the individual proposal distributions, for participants with more extreme values, these can still be proposed and selected, which will have an effect on the group level estimate.

## 2.3 Extra Things

### 2.3.1 Compression

`make_samplers`

compresses the data, grouping together trials with the same parameters and RTs that differ less than the value specified in the rt_resolution argument. Here we use the default response time units, which assumes a seconds scale (using seconds is STRONGLY recommended), which is appropriate to visual stimuli presented on a monitor with 50Hz refresh (i.e., refreshing each 0.02s). Uniform random error in rt measurement is introduced in such a setup unless timing is coordinated with the sync pulse. Response devices such as the mouse or keyboard buttons introduce further error, so this setting is quite conservative. Taking measurement resolution can substantially speed up likelihood calculation, the key bottleneck in sampling, in this case by a factor of 4.

### References

*Journal of Mathematical Psychology*96: 102368.

*Frontiers in Neuroscience*8: n/a. doi: 10.3389/fpsyg.2014.00150.

*Journal of Experimental Psychology: Learning, Memory, and Cognition*40 (5): 1226.