Chapter 6 Model Identifiability
In some statistical models, different parameter values can give rise to identical probability distributions. When this happens, there will be a number of different parameter values associated with the maximum likelihood of any set of observed data. This is referred to as the model identifiability problem. (Everitt & Howell, 2005)
Model identifiability is crucial for several reasons:
- Parameter estimation: If a model is not identifiable, it is not possible to obtain unique estimates of parameters.
- Model Interpretation: If we cannot determine the unique estimates of the parameters, it is challenging to make inference about the estimates.
- Model identifiability issues may arise for several reasons such as over or redundant parameterizations of the model or modeling structure (e.g., hierarchical model, constraints), among others.
An example (Everitt & Howell, 2005)
Can you regress Y on x1, x2, and x3=x1+x2?
Let us try to understand the problem. Let us specify the regression model first.
Y=β0+β1x1+β2x2+β3x3+ϵ But,we know that X3=X1+X2. Substituting the value of X3, Y=β0+β1x1+β2x2+β3(x1+x2)+ϵ or, Y=β0+(β1+β3)x1+(β2+β3)x2+ϵ or, Y=β0+β′1x1+β′2x2+ϵ Hence, we were supposed to estimate four parameters, but ended up with three parameters. Furthermore, if we include x3 in the regression model, it will be extremely confusing to interpret the model parameters. This is a classic case of multicollinearity.
It is important to note that identifiability issue is common in all types of statistical models, including CDMs, IRT models,Structural equation modeling etc.