Chapter 6 Model Identifiability
In some statistical models, different parameter values can give rise to identical probability distributions. When this happens, there will be a number of different parameter values associated with the maximum likelihood of any set of observed data. This is referred to as the model identifiability problem. (Everitt & Howell, 2005)
Model identifiability is crucial for several reasons:
- Parameter estimation: If a model is not identifiable, it is not possible to obtain unique estimates of parameters.
- Model Interpretation: If we cannot determine the unique estimates of the parameters, it is challenging to make inference about the estimates.
- Model identifiability issues may arise for several reasons such as over or redundant parameterizations of the model or modeling structure (e.g., hierarchical model, constraints), among others.
An example (Everitt & Howell, 2005)
Can you regress \(Y\) on \(x_1\), \(x_2\), and \(x_3=x_1+x_2\)?
Let us try to understand the problem. Let us specify the regression model first.
\[Y=\beta_0+\beta_1x_1+\beta_2x_2+\beta_3x_3+\epsilon\] But,we know that \(X_3=X_1+X_2\). Substituting the value of \(X_3\), \[ Y=\beta_0+\beta_1 x_1+\beta_2 x_2+\beta_3\left(x_1+x_2\right)+\epsilon \] or, \[ Y=\beta_0+\left(\beta_1+\beta_3\right) x_1+\left(\beta_2+\beta_3\right) x_2+\epsilon \] or, \[ Y=\beta_0+\beta_1^{\prime} x_1+\beta_2^{\prime} x_2+\epsilon \] Hence, we were supposed to estimate four parameters, but ended up with three parameters. Furthermore, if we include \(x_3\) in the regression model, it will be extremely confusing to interpret the model parameters. This is a classic case of multicollinearity.
It is important to note that identifiability issue is common in all types of statistical models, including CDMs, IRT models,Structural equation modeling etc.