11.5 Diagnosing the Mechanism

11.5.1 MAR vs. MNAR

The only true way to distinguish between MNAR and MAR is to measure some of that missing data.

It’s a common practice among professional surveyors to, for example, follow-up on a paper survey with phone calls to a group of the non-respondents and ask a few key survey items. This allows you to compare respondents to non-respondents.

If their responses on those key items differ by very much, that’s good evidence that the data are MNAR.

However in most missing data situations, we can’t get a hold of the missing data. So while we can’t test it directly, we can examine patterns in the data get an idea of what’s the most likely mechanism.

The first thing in diagnosing randomness of the missing data is to use your substantive scientific knowledge of the data and your field. The more sensitive the issue, the less likely people are to tell you. They’re not going to tell you as much about their cocaine usage as they are about their phone usage.

Likewise, many fields have common research situations in which non-ignorable data is common. Educate yourself in your field’s literature.

11.5.2 MCAR vs. MAR

There is a very useful test for MCAR, Little’s test.

A second technique is to create dummy variables for whether a variable is missing.

1 = missing 0 = observed

You can then run t-tests and chi-square tests between this variable and other variables in the data set to see if the missingness on this variable is related to the values of other variables.

For example, if women really are less likely to tell you their weight than men, a chi-square test will tell you that the percentage of missing data on the weight variable is higher for women than men.