Chapter 3 EMC Specifics

3.1 Group level model

EMC uses the PMwG sampler from Gunawan et al. (2020). PMwG proposes a hierarchical Bayesian sampling scheme, which means that we can obtain samples for both the subject and group level. As PMwG uses this hierarchical sampling approach, we need a prior for the subjects and a prior for the group. Here, the prior group level model specified is a multivariate normal (MVN). For the prior on the group, we use the prior specified by Gunanwan et al., (2020).

By specifying a MVN for the group level model, we assume that the distribution of parameter values across participants are normally distributed within parameters, and are able to have associations between parameters. Generally, this can be tricky to estimate, however is a strong advantage of PMwG, as it allows us to capture the covariance between parameters. For models that often exhibit collinearity (such as EAMs), this is a key advantage. This feature also means that we are able to estimate the correlations between different parameters. This means for the group level model, users of EMC are able to make group-level inference on parameter means, such as evaluating effect parameters, evaluate within parameter variances, assess certainty of model estimates, and make inference on the correlations between parameters.

3.2 Priors

In any Bayesian application, priors play an important role in incorporating the prior known information. In model-based sampling applications, the prior is important to restrict the sampling space so that values that are not supported are not sampled. For example, in an EAM, t0 must be positive, as this refers to a specific amount of time. In DMC, we would often set a truncated normal prior on the distribution of t0 to ensure that only positive values were sampled. PMwG however, includes more general group level priors.

3.2.1 PMwG Priors

The priors for PMwG are best thought of at three levels. At the base of this, we have subjects, where the prior on the subjects is the second level - the group - and the prior on the group is the hyper-prior. When estimating the models, we can look at this top down, where the hyper prior influences the group level model and the group level model constrains the subject level estimates.

For the group level model, the prior on the subjects is that we assume a MVN. For the prior on the MVN, we need a prior on Mu and Sigma. For Mu we use another MVN, with mean zero and the covariance matrix as an identity matrix. The mean is the default prior given by prior$theta_mu_mean, however, you are able to specify you’re own prior Theta Mu by including this in the sampler object. Similarly, the Theta covariance prior identity matrix can be found in prior$theta_mu_var, and can be changed to increase the variance (diagonal) if needed (i.e., for parameters with different ranges of parameter values).

For the prior on the variance-covariance matrix of the MVN, we use Huang and Wand’s (2013) prior. This prior is a mix of inverse Wishart distributions, with mixture weights according to an inverse Gaussian distribution. This prior leads to marginally non-informative (close to uniform) priors on the correlations, and half-t distributed priors on the standard deviations (variances). The default settings for these priors is hard coded in the sampler.

You may have noticed that the default mean for the group level MVN is 0. This is because PMwG samples on the real line, such that any value from -Inf to Inf are possible. Going back to our t0 example - this is problematic as we could sample negative t0 values. To overcome this, we use transformations on the parameters or protections against certain values.

3.3 Extra EMC functionality

3.3.1 Transformations

In doing a transformation on a parameter, we transform the values from the sampled scale to the natural scale. Here, the natural scale is the scale of the parameter which is used by the model, whilst the sampled scale is the actual values sampled in the algorithm. This is important for interpretation of the parameters (and correlations), and we include functionality to view either scale.

One example of a transformation is the log scale. Here, we would take a parameter (such as t0) that is strictly positive, and we take the exponent of the sampled value before using this transformed value for the likelihood calculation, thereby ensuring that the model uses only positive values. Another transformation is the probit transform, where we obtain values between 0 and 1.

All models that are implemented in EMC include their own parameter transformations within the model file, so it is taken care of for you. However, it is good to be aware of the scale used for any given model parameters for sampling. This is both for interpretation and model estimation. For interpretation, the reasoning is straightforward - we want to report the values that the model is using (the natural scale), so that we don’t report negative values of t0. For model estimation, we can come across difficulties in estimation when a parameter may get stuck in a bad region, and the transformation results in the same outcome. For example, on the log scale, a value of -6 compared to -1000 are actually pretty close, where on the sampled scale, this is not the case - yet both would have similar support in the model. We’ve included several protections to avoid these problems, but it is always worthwhile checking the sampling process by evaluating parameter chains.

3.3.2 Protections

EMC also includes a number of protections to protect against certain values being used by models. Again keeping with t0, we often specify that values of t0 larger than the smallest RT are given an extremely small likelihood, meaning that these values are not accepted in the sampling process. This can be problematic if we include very small response times, so we may want to first remove some of these. For this though, we also have solutions.

3.3.3 Truncation

3.3.4 Censoring

3.4 Sampling

In this section, we explain how PMwG sampling works in relation to EMC arguments. PMwG uses likelihood calculation to find the posterior mode for a given model and data. On each iteration, the likelihood of a number of particles is estimated for each subject, and compared to the previous particle. Each particle is a vector of the parameter values that is used in the model. The best of these particles, i.e., particle with the highest likelihood, is then selected and reatined for the next iteration. This is the basic idea of the particle metropolis step of PMwG. For each particle however, we not only find the likelihood of the particle given the data, but also the likelihood of that particle given the group level distribution, such that the particle that is accepted has the highest marginal weight.

When proposing particles, there are three multivariate normal distributions that particles can be drawn from. The first is the group level distribution, which has group level mean and group level variance. The second is the individual distribution, which has the mean of the previous particle for the individual and group level variance scaled by a parameter epsilon. The final distribution is the efficient proposal distribution. This distribution is also unique for each person and is a conditional multivariate normal distribution, conditioned on the previous particles. This proposal distribution is essentially a “posterior” proposal distribution, meaning particles are more likely to be accepted.

In PMwG, acceptance refers to the rate of “new” particles being selected (where the “old particle is the previous particle). For reasons related to efficiency and sampling behaviour, we do not aim for 100% acceptance rate. Instead, it is common to aim for between 20 and 80% acceptance (although we find this is model dependent). To assist with this, we’ve included functionality that automatically aims for a good acceptance rate. This is difficult as acceptance can differ across subjects, so for each subject, we scale the epsilon parameter to make the variance larger when acceptance is too high, or smaller when the variance is too low. This is the p_star argument, where we also set a maximum epsilon value to restict the range and prevent computational errors.

For group level sampling, we use a Gibbs step to estimate a multivariate normal group level model (in the “standard” approach). This means that on each iteration, we estimate the mean, given the group variance, then sample the variance given mu and mixing weights, and then sample the mixing weights given mu and the variance. However, as noted below, we can estimate other group level models aside from the standard multivariate normal.

3.4.1 Stages

PMwG works in three stages. The first is burn-in where the sampling aims to find the posterior mode, sampling particles from the group level and individual proposal distributions. The second stage is adaptation where sampling continues, however after several iterations, an attempt is made to create the conditional efficient distribution, conditioned on the previous particles sampled in adaptation (from the posterior). The final stage is sampling, where along with the two previous proposal distributions, particles are sampled from the efficient distribution to achieve accurate and efficient sampling. We check all stages of sampling, but tend to make inference on the sampled stage.

3.4.2 Chains

In EMC, we use multiple chains (default = 3) to ensure accurate posterior sampling. Essentially, in EMC, chains are separate runs of PMwG, however, chains will interact when they get stcuk in bad regions after some time. To ensure we get accurate posterior sampling, we want to see convergence of chains across parameters (and likelihood estimates) for subject and group level estimates as a reliability test. There are instances where the sampler can get “stuck”, so chains help us avoid making inference on samples which haven’t reached the posterior.

3.4.3 Convergence

Convergence relates to the chains reaching the same (hopefully) posterior mode. To check this, we take the Gelman-diag statistic which calculates the variance within and between chains, where a value of 1 is optimal. Our default setting for all chains is below 1.2 for our automatised functions, such as auto_burn. This function will sample until a criterion GD is reached, meaning you don’t have to worry about how many iterations you do for burn it. Although, on this, it may help to set a maximum, or minimum, or extra samples in burn-in for whatever your aim is. For example, you may want to first test if the model even runs, so you could set iterations to 10 to check that you don’t get an error.

3.5 Shrinkage

Shrinkage is a feature of hierarchical models, whereby individual parameter estimates are constrained by the group level. This means that extreme estimates for individual subjects are “shrunk” towards the group level distribution mean. PMwG has some elements that impose shrinkage, and others that help to ensure that shrinkage does not greatly affect subject level estimates. As particles are evaluated relative to the group level model (i.e. P(particle|group)), particles within the group distribution will have a higher likelihood of being selected. However, as individual particles can come from both the group and the individual proposal distributions, for participants with more extreme values, these can still be proposed and selected, which will have an effect on the group level estimate.

3.6 Other group level models

3.7 Extra Things

3.7.1 Compression

make_samplers compresses the data, grouping together trials with the same parametersand rt’s that differ less than the value specified in the rt_resolution argument. Here we use the default response time units, which assumes a seconds scale (using seconds is STRONGLY recommended), which is appropriate to visual stimuli presented on a monitor with 50Hz refresh (i.e., refreshing each 0.02s). Uniform random error in rt measurement is introduced in such a setup unless timing is coordinated with the sync pulse. Response devices such as the mouse or keyboard buttons introduce further error, so this setting is quite conservative. Taking measurement resolution can substantially speed up likelihood calculation, the key bottleneck in sampling, in this case by a factor of 4.

References

Gunawan, D., G. E. Hawkins, M-N Tran, R Kohn, and S. D. Brown. 2020. “New Estimation Approaches for the Hierarchical Linear Ballistic Accumulator Model.” Journal of Mathematical Psychology 96: 102368.