Chapter 6 Model Identifiability

In some statistical models, different parameter values can give rise to identical probability distributions. When this happens, there will be a number of different parameter values associated with the maximum likelihood of any set of observed data. This is referred to as the model identifiability problem. (Everitt & Howell, 2005)

Model identifiability is crucial for several reasons:

Parameter estimation: If a model is not identifiable, it is not possible to obtain unique estimates of parameters.
Model Interpretation: If we cannot determine the unique estimates of the parameters, it is challenging to make inference about the estimates.
Model identifiability issues may arise for several reasons such as over or redundant parameterizations of the model or modeling structure (e.g., hierarchical model, constraints), among others.

An example (Everitt & Howell, 2005)

Can you regress $Y$ on $x_1$ , $x_2$ , and $x_3=x_1+x_2$ ?

Let us try to understand the problem. Let us specify the regression model first.

$Y=\beta_0+\beta_1x_1+\beta_2x_2+\beta_3x_3+\epsilon$ But,we know that $X_3=X_1+X_2$ . Substituting the value of $X_3$ , $Y=\beta_0+\beta_1 x_1+\beta_2 x_2+\beta_3\left(x_1+x_2\right)+\epsilon$ or, $Y=\beta_0+\left(\beta_1+\beta_3\right) x_1+\left(\beta_2+\beta_3\right) x_2+\epsilon$ or, $Y=\beta_0+\beta_1^{\prime} x_1+\beta_2^{\prime} x_2+\epsilon$ Hence, we were supposed to estimate four parameters, but ended up with three parameters. Furthermore, if we include $x_3$ in the regression model, it will be extremely confusing to interpret the model parameters. This is a classic case of multicollinearity.

It is important to note that identifiability issue is common in all types of statistical models, including CDMs, IRT models,Structural equation modeling etc.

References

Everitt, B., & Howell, D. C. (Eds.). (2005). Encyclopedia of statistics in behavioral science. John Wiley & Sons.