C Some proofs on ROC curves

C.1 The ROC curve

Assume we have a continuous (in mathematical terminology “absolutely continuous”) result \(Y\) from a diagnostic test and denote the true and false positive fraction for a given threshold \(c\)

\[\begin{eqnarray*} \mbox{TPF}(c) & = & \Pr(Y \geq c \,\vert\,D=1) \\ \mbox{FPF}(c) & = & \Pr(Y \geq c \,\vert\,D=0). \end{eqnarray*}\]

The test result \(Y\) will have different distributions in the diseased (\(D=1\)) and non-diseased (\(D=0\)) population and we will denote the corresponding random variable by \(Y_D\) and \(Y_{\bar D}\), respectively.

Further define the ROC curve via

\[\begin{eqnarray*} \mbox{ROC}(.) & = & \{(\mbox{FPF}(c), \mbox{TPF}(c)), c \in (-\infty, \infty)\}, \end{eqnarray*}\]

the ROC curve are the points \((\mbox{FPF}(c), \mbox{TPF}(c))\) for all possible thresholds \(c\). Now define the survivor functions in the diseased and non-diseased populations:

\[\begin{eqnarray*} S_D(y) & = & \Pr(Y \geq y \,\vert\,D=1) = \mbox{TPF}(y) \\ S_{\bar D}(y) & = & \Pr(Y \geq y \,\vert\,D=0) = \mbox{FPF}(y). \end{eqnarray*}\]

Note that the survivor functions are strictly monotone and hence invertible.

For a given threshold \(c\), suppose \(t=\mbox{FPF}(c)= S_{\bar D}(c)\), so \(c=S_{\bar D}^{-1}(t)\) and we obtain \[ \mbox{ROC}(t) = \mbox{TPF}(c) = S_D(c) = S_D(S_{\bar D}^{-1}(t)). \]

C.2 AUC

The area under the curve (AUC) is defined as \[ \mbox{AUC} = \int_0^1 \mbox{ROC}(t)dt \] and we have

\[\begin{equation} \tag{C.1} \mbox{AUC} = \Pr(Y_D > Y_{\bar D}), \end{equation}\]

if \(Y_D\) and \(Y_{\bar D}\) are independent test results from the diseased and non-diseased populations, respectively.

To proof this result, we first note that the derivative of a survivor function \(S(y)\) of a continous test result \(Y\) is closely related to the cumulative distribution function \(F(y)\) of \(Y\): \[ S(y) = \Pr(Y \geq y) = 1 - \Pr(Y < y) = 1 - \Pr(Y \leq y) = 1 - F(y) \] Now \[\begin{eqnarray*} \ifthenelse{\isempty{}} {\frac{d\,F(y)}{d\,y}} {\frac{d^{} F(y)}{d\,y^{}}} = f(y) \end{eqnarray*}\] where \(f(y)\) denotes the density function of \(Y\). Therefore \[ \ifthenelse{\isempty{}} {\frac{d\,S(y)}{d\,y}} {\frac{d^{} S(y)}{d\,y^{}}} = -f(y). \]

Now consider \[\begin{eqnarray} \tag{C.2} \mbox{AUC} &=& \int_0^1 \mbox{ROC}(t)dt \nonumber \\ &=& \int_0^1 S_D(S_{\bar D}^{-1}(t))dt \nonumber \\ &=& \int_{\infty}^{-\infty} S_D(y) \left\{ -f_{\bar D}(y) dy \right\}\nonumber \\ &=& \int_{-\infty}^{\infty} S_D(y) f_{\bar D}(y) dy, \end{eqnarray}\] where we have applied the substitution \(t=S_{\bar D}(y)\), so \[ dt = d S_{\bar D}(y) = -f_{\bar D}(y)dy. \] Let \(\mathsf{I}\{A\}\) denote the indicator function of an event \(A\). Now \(Y_D\) and \(Y_{\bar D}\) are assumed to be independent, so \[\begin{eqnarray*} \Pr(Y_D \geq Y_{\bar D}) &=& \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} \mathsf{I}\{y_D \geq y_{\bar D}\} f(y_D, y_{\bar D}) dy_D dy_{\bar D} \\ &=& \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} \mathsf{I}\{y_D \geq y_{\bar D}\} f_D(y_D) f_{\bar D}(y_{\bar D}) dy_D dy_{\bar D} \\ &=& \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} \left[ \mathsf{I}\{y_D \geq y_{\bar D}\} f_D(y_D)dy_D \right] f_{\bar D}(y_{\bar D}) dy_{\bar D} \\ &=& \int_{-\infty}^{\infty} \Pr(Y_D \geq y_{\bar D}) f_{\bar D}(y_{\bar D}) dy_{\bar D} \\ &=& \int_{-\infty}^{\infty} S_D(y_{\bar D}) f_{\bar D}(y_{\bar D}) dy_{\bar D} \end{eqnarray*}\]

where the last equation is equal to (C.2) with \(y_{\bar D}=y\). This proofs equation (C.1).