C Some proofs on ROC curves
C.1 The ROC curve
Assume we have a continuous (in mathematical terminology “absolutely continuous”) result Y from a diagnostic test and denote the true and false positive fraction for a given threshold c
TPF(c)=Pr
The test result Y will have different distributions in the diseased (D=1) and non-diseased (D=0) population and we will denote the corresponding random variable by Y_D and Y_{\bar D}, respectively.
Further define the ROC curve via
\begin{eqnarray*} \mbox{ROC}(.) & = & \{(\mbox{FPF}(c), \mbox{TPF}(c)), c \in (-\infty, \infty)\}, \end{eqnarray*}
the ROC curve are the points (\mbox{FPF}(c), \mbox{TPF}(c)) for all possible thresholds c. Now define the survivor functions in the diseased and non-diseased populations:
\begin{eqnarray*} S_D(y) & = & \Pr(Y \geq y \,\vert\,D=1) = \mbox{TPF}(y) \\ S_{\bar D}(y) & = & \Pr(Y \geq y \,\vert\,D=0) = \mbox{FPF}(y). \end{eqnarray*}
Note that the survivor functions are strictly monotone and hence invertible.
For a given threshold c, suppose t=\mbox{FPF}(c)= S_{\bar D}(c), so c=S_{\bar D}^{-1}(t) and we obtain \mbox{ROC}(t) = \mbox{TPF}(c) = S_D(c) = S_D(S_{\bar D}^{-1}(t)).
C.2 AUC
The area under the curve (AUC) is defined as \mbox{AUC} = \int_0^1 \mbox{ROC}(t)dt and we have
\begin{equation} \tag{C.1} \mbox{AUC} = \Pr(Y_D > Y_{\bar D}), \end{equation}
if Y_D and Y_{\bar D} are independent test results from the diseased and non-diseased populations, respectively.
To proof this result, we first note that the derivative of a survivor function S(y) of a continous test result Y is closely related to the cumulative distribution function F(y) of Y: S(y) = \Pr(Y \geq y) = 1 - \Pr(Y < y) = 1 - \Pr(Y \leq y) = 1 - F(y) Now \begin{eqnarray*} {\frac{d\,F(y)}{d\,y}} = f(y) \end{eqnarray*} where f(y) denotes the density function of Y. Therefore {\frac{d\,S(y)}{d\,y}} = -f(y).
Now consider \begin{eqnarray} \tag{C.2} \mbox{AUC} &=& \int_0^1 \mbox{ROC}(t)dt \nonumber \\ &=& \int_0^1 S_D(S_{\bar D}^{-1}(t))dt \nonumber \\ &=& \int_{\infty}^{-\infty} S_D(y) \left\{ -f_{\bar D}(y) dy \right\}\nonumber \\ &=& \int_{-\infty}^{\infty} S_D(y) f_{\bar D}(y) dy, \end{eqnarray} where we have applied the substitution t=S_{\bar D}(y), so dt = d S_{\bar D}(y) = -f_{\bar D}(y)dy. Let \mathsf{I}\{A\} denote the indicator function of an event A. Now Y_D and Y_{\bar D} are assumed to be independent, so \begin{eqnarray*} \Pr(Y_D \geq Y_{\bar D}) &=& \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} \mathsf{I}\{y_D \geq y_{\bar D}\} f(y_D, y_{\bar D}) dy_D dy_{\bar D} \\ &=& \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} \mathsf{I}\{y_D \geq y_{\bar D}\} f_D(y_D) f_{\bar D}(y_{\bar D}) dy_D dy_{\bar D} \\ &=& \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} \left[ \mathsf{I}\{y_D \geq y_{\bar D}\} f_D(y_D)dy_D \right] f_{\bar D}(y_{\bar D}) dy_{\bar D} \\ &=& \int_{-\infty}^{\infty} \Pr(Y_D \geq y_{\bar D}) f_{\bar D}(y_{\bar D}) dy_{\bar D} \\ &=& \int_{-\infty}^{\infty} S_D(y_{\bar D}) f_{\bar D}(y_{\bar D}) dy_{\bar D} \end{eqnarray*}
where the last equation is equal to (C.2) with y_{\bar D}=y. This proofs equation (C.1).