5.19 Interpreting a model with a transformed outcome

If the outcome has been transformed using a non-linear transformation such as a Box-Cox transformation or one of its special cases, then the relationships between the outcome and predictor should be interpreted on the transformed scale, not the original scale. For example, in the final model we fit in Example 5.1, the regression coefficients are as follows:

round(summary(fit.ex5.1.trans)$coef, 4)
##                            Estimate Std. Error  t value Pr(>|t|)
## (Intercept)                 -0.1127     0.0038 -29.2940   0.0000
## BMXWAIST                     0.0003     0.0000   9.7426   0.0000
## smokerPast                   0.0018     0.0013   1.4401   0.1502
## smokerCurrent               -0.0001     0.0015  -0.0732   0.9416
## RIDAGEYR                     0.0003     0.0000   9.7697   0.0000
## RIAGENDRFemale              -0.0047     0.0011  -4.4085   0.0000
## race_ethNon-Hispanic White  -0.0046     0.0015  -3.0799   0.0021
## race_ethNon-Hispanic Black  -0.0027     0.0020  -1.3160   0.1885
## race_ethNon-Hispanic Other  -0.0007     0.0022  -0.3116   0.7554
## income$25,000 to < $55,000   0.0006     0.0017   0.3760   0.7070
## income$55,000+              -0.0001     0.0015  -0.0689   0.9451

We conclude that waist circumference is positively associated with transformed fasting glucose and that a 1-unit difference in waist circumference is associated with a 0.0003 unit difference in mean \(-FG^{-1.5}\), not with a 0.0003 difference in mean fasting glucose (mmol/L) on the original scale.

This relationship holds no matter the value of waist circumference – that is what it means to have a linear relationship. However, on the original scale of FG (mmol/L), the relationship is non-linear, as illustrated in the right-hand panel of Figure 5.43. On the original scale, the effect of waist circumference on fasting glucose depends on the value of waist circumference – at lower levels the effect is less and at greater levels the effect is greater.

Note: Back-transforming using the inverse of the outcome transformation does not lead to a valid estimate of the mean outcome on the original scale. However, if the assumptions of the linear regression model are met on the transformed scale, then the mean transformed outcome is equal to the median transformed outcome, and the back-transformed median is the estimated median on the original scale (Harrell 2015, 391).

# Vector of WC values at which to predict
X    <- seq(min(nhanesf.complete$BMXWAIST),
            max(nhanesf.complete$BMXWAIST))

# Estimate mean outcome for these WC values
# Assume other predictors are at their mean
# or reference level
PRED <- predict(fit.ex5.1.trans,
                data.frame(BMXWAIST = X,
                           smoker   = "Never",
                           RIDAGEYR = mean(nhanesf.complete$RIDAGEYR),
                           RIAGENDR = "Male",
                           race_eth = "Hispanic",
                           income   = "< $25,000"))

# Plot on transformed scale
par(mfrow = c(1, 2))
plot(X, PRED, type = "l",
     xlab = "Waist Circumference (cm)",
     ylab = "Mean Transformed Fasting Glucose",
     main = "Transformed Outcome")

# LBDGLUSI_trans = -1*LBDGLUSI^(-1.5)
# Back-transform to the original scale
FG <- (-1*PRED)^(1/(-1.5))

# Plot on original scale
plot(X, FG, type = "l",
     xlab = "Waist Circumference (cm)",
     ylab = "Median Fasting Glucose (mmol/L",
     main = "Original Scale")
Interpreting a transformed outcome

Figure 5.43: Interpreting a transformed outcome

This is similar to the effect of including a polynomial transformation of a predictor (see Section 4.7) – the relationship between the outcome and the predictor, on the original scale, becomes non-linear.

References

Harrell, Frank E, Jr. 2015. Regression Modeling Strategies. 2nd ed. Switzerland: Springer International Publishing.