C Some proofs on ROC curves

C.1 The ROC curve

Assume we have a continuous (in mathematical terminology “absolutely continuous”) result $Y$ from a diagnostic test and denote the true and false positive fraction for a given threshold $c$

$\begin{eqnarray*} \mbox{TPF}(c) & = & \Pr(Y \geq c \,\vert\,D=1) \\ \mbox{FPF}(c) & = & \Pr(Y \geq c \,\vert\,D=0). \end{eqnarray*}$

The test result $Y$ will have different distributions in the diseased ( $D=1$ ) and non-diseased ( $D=0$ ) population and we will denote the corresponding random variable by $Y_D$ and $Y_{\bar D}$ , respectively.

Further define the ROC curve via

$\begin{eqnarray*} \mbox{ROC}(.) & = & \{(\mbox{FPF}(c), \mbox{TPF}(c)), c \in (-\infty, \infty)\}, \end{eqnarray*}$

the ROC curve are the points $(\mbox{FPF}(c), \mbox{TPF}(c))$ for all possible thresholds $c$ . Now define the survivor functions in the diseased and non-diseased populations:

$\begin{eqnarray*} S_D(y) & = & \Pr(Y \geq y \,\vert\,D=1) = \mbox{TPF}(y) \\ S_{\bar D}(y) & = & \Pr(Y \geq y \,\vert\,D=0) = \mbox{FPF}(y). \end{eqnarray*}$

Note that the survivor functions are strictly monotone and hence invertible.

For a given threshold $c$ , suppose $t=\mbox{FPF}(c)= S_{\bar D}(c)$ , so $c=S_{\bar D}^{-1}(t)$ and we obtain $\mbox{ROC}(t) = \mbox{TPF}(c) = S_D(c) = S_D(S_{\bar D}^{-1}(t)).$

C.2 AUC

The area under the curve (AUC) is defined as $\mbox{AUC} = \int_0^1 \mbox{ROC}(t)dt$ and we have

$\begin{equation} \tag{C.1} \mbox{AUC} = \Pr(Y_D > Y_{\bar D}), \end{equation}$

if $Y_D$ and $Y_{\bar D}$ are independent test results from the diseased and non-diseased populations, respectively.

To proof this result, we first note that the derivative of a survivor function $S(y)$ of a continous test result $Y$ is closely related to the cumulative distribution function $F(y)$ of $Y$ : $S(y) = \Pr(Y \geq y) = 1 - \Pr(Y < y) = 1 - \Pr(Y \leq y) = 1 - F(y)$ Now $\begin{eqnarray*} {\frac{d\,F(y)}{d\,y}} = f(y) \end{eqnarray*}$ where $f(y)$ denotes the density function of $Y$ . Therefore ${\frac{d\,S(y)}{d\,y}} = -f(y).$

Now consider $\begin{eqnarray} \tag{C.2} \mbox{AUC} &=& \int_0^1 \mbox{ROC}(t)dt \nonumber \\ &=& \int_0^1 S_D(S_{\bar D}^{-1}(t))dt \nonumber \\ &=& \int_{\infty}^{-\infty} S_D(y) \left\{ -f_{\bar D}(y) dy \right\}\nonumber \\ &=& \int_{-\infty}^{\infty} S_D(y) f_{\bar D}(y) dy, \end{eqnarray}$ where we have applied the substitution $t=S_{\bar D}(y)$ , so $dt = d S_{\bar D}(y) = -f_{\bar D}(y)dy.$ Let $\mathsf{I}\{A\}$ denote the indicator function of an event $A$ . Now $Y_D$ and $Y_{\bar D}$ are assumed to be independent, so $\begin{eqnarray*} \Pr(Y_D \geq Y_{\bar D}) &=& \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} \mathsf{I}\{y_D \geq y_{\bar D}\} f(y_D, y_{\bar D}) dy_D dy_{\bar D} \\ &=& \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} \mathsf{I}\{y_D \geq y_{\bar D}\} f_D(y_D) f_{\bar D}(y_{\bar D}) dy_D dy_{\bar D} \\ &=& \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} \left[ \mathsf{I}\{y_D \geq y_{\bar D}\} f_D(y_D)dy_D \right] f_{\bar D}(y_{\bar D}) dy_{\bar D} \\ &=& \int_{-\infty}^{\infty} \Pr(Y_D \geq y_{\bar D}) f_{\bar D}(y_{\bar D}) dy_{\bar D} \\ &=& \int_{-\infty}^{\infty} S_D(y_{\bar D}) f_{\bar D}(y_{\bar D}) dy_{\bar D} \end{eqnarray*}$

where the last equation is equal to (C.2) with $y_{\bar D}=y$ . This proofs equation (C.1).