Month: August 2015

FDR and Benjamini-Hochberg

We wish to test {N} null hypotheses {H_{01}, \dots, H_{0N}} indexed by the set {\{1, \dots, N\}}. The hypotheses indexed by {I_0 \subseteq \{1, \dots, N\}} are truly null with {|I_0| = N_0} and the remaining hypotheses are non-null. A test in this setting looks at the data and decides to accept or reject each {H_{0i}}. While devising such a test, one obviously wants to guard against rejecting too many true null hypotheses. A classical way of ensuring this is to allow only those tests whose Family Wise Error Rate (FWER) is controlled at a predetermined small level. The FWER is defined as

\displaystyle FWER := \mathop{\mathbb P} \left(\cup_{i \in I_0} \left\{\text{Reject } H_{0i} \right\} \right). \ \ \ \ \ (1)

It is very easy to design a test whose FWER is controlled by a predetermined level {\alpha}: reject or accept each hypothesis {H_{0i}} according to a test whose type I error is atmost {\alpha/N}. By the union bound, one then has

\displaystyle FWER = \mathop{\mathbb P} \left(\cup_{i \in I_0} \left\{\text{Reject } H_{0i} \right\} \right) \leq \sum_{i \in I_0} \mathop{\mathbb P} \left\{\text{Reject } H_{0i} \right\} \leq \frac{\alpha N_0}{N} \leq \alpha.

The above procedure is sometimes called the Bonferroni method. In modern theory of hypothesis testing, control of the FWER is considered too stringent mainly because it leads to tests that fail to reject many non-null hypotheses as well. The modern method is to insist on control of FDR (False Discovery Rate) as opposed to FWER. The FDR of a test is defined as

\displaystyle FDR = \mathop{\mathbb E} \left( \frac{V}{R \vee 1} \right)

where

\displaystyle R := \sum_{i=1}^N I \left\{\text{Reject } H_{0i} \right\} \text{ and } V := \sum_{i \in I_0} I \left\{\text{Reject } H_{0i} \right\}

and {R \vee 1 := \max(R, 1)}. The quantity {V/(R \vee 1)} is often called the FDP (False Discovery Proportion). FDR is therefore the expectation of FDP.

How does one design a test whose FDR is controlled at a predetermined level {\alpha} (e.g., {\alpha = 0.1}) and which rejects more often that the Bonferroni procedure? This was answered by Benjamini and Hochberg in a famous paper in 1995. Their procedure is described below. For each hypothesis {H_{0i}}, obtain a {p}-value {p_i}. For {i \in I_0}, the {p}-value {p_i} has the uniform distribution on {[0, 1]}. For {i \notin I_0}, the {p}-value {p_i} has some other distribution probably more concentrated near 0. Let the ordered {p}-values be {p_{(1)} < \dots < p_{(N)}}. The BH procedure is the following:

\displaystyle \text{Reject } H_{0i} \text{ if and only if } p_{i} \leq \frac{i_{\max} \alpha}{N}

where

\displaystyle i_{\max} := \max \left\{1 \leq i \leq N : p_{(i)} \leq \frac{i \alpha}{N} \right\}.

In the event that {p_{(i)} > i \alpha/N} for all {i}, we take {i_{\max} = 0}. The BH procedure is probably easier to understand via the following sequential description. Start with {i = N} and keep accepting the hypothesis corresponding to {p_{(i)}} as long as {p_{(i)} > \alpha i/N}. As soon as {p_{(i)} \leq i \alpha/N}, stop and reject all the hypotheses corresponding to {p_{(j)}} for {j \leq i}.

It should be clear that the BH procedure rejects hypotheses much more liberally compared to the Bonferroni method (which rejects when {p_i \leq \alpha/N}). Indeed any hypothesis rejected by the Bonferroni method will also be rejected by the BH procedure. The famous Benjamini-Hochberg theorem states that the FDR of the BH procedure is exactly equal to {N_0 \alpha/N} under the assumption that the {p}-values {p_1, \dots, p_N} are independent:

Theorem 1 (Benjamini-Hochberg) The FDR of the BH procedure is exactly equal to {N_0 \alpha/N} under the assumption that the {p}-values {p_1, \dots, p_N} are independent.

There probably exist many proofs of this by-now classical inequality. Based on an understated google search, I was able to find two extremely slick and short proofs which I describe below. Prior to that, let me provide some rudimentary intution for the specific form of the BH procedure. Based on the {p}-values {p_1,\dots, p_N}, our goal is to reject or accept each hypothesis {H_{0i}}. It is obvious that we will have to reject those for which {p_i} is small but how small is the question. Suppose we decide to reject all hypotheses for which the {p}-value is less than or equal to {t}. For this procedure, the number of rejections and the number of false rejections are given by

\displaystyle R_t := \sum_{i=1}^N I \{p_i \leq t\} ~~ \text{ and } ~~ V_t := \sum_{i \in I_0} I \{p_i \leq t\} \ \ \ \ \ (2)

respectively. Consequently the FDR of this procedure is {FDR_t := \mathop{\mathbb E} V_t/(R_t \vee 1)}. We would ideally like to choose {t} to be the largest subject to the constraint that {FDR_t \leq \alpha} (largest because larger values of {t} lead to more rejections or discoveries). Unfortunately, we do not quite know what {\mathop{\mathbb E} V_t/(R_t \vee 1)} is; we do not even know what {V_t/(R_t \vee 1)} is (if we did we could have used it as a proxy for {FDR_t}). We do however know what {R_t} is but {V_t} requires knowledge of {I_0} which we do not have. However, the expectation of {V_t} equals {N_0 t} which we know cannot be larger than the known quantity {Nt}. It is therefore reasonable to choose {t} as

\displaystyle \tau := \sup \left\{t \in [0, 1] : \frac{Nt}{R_t \vee 1} \leq \alpha \right\} \ \ \ \ \ (3)

and reject all {p}-values which are less than or equal to {\tau}. This intuitive procedure is actually exactly the same as the BH procedure and this fact is not very hard to see.

Proof One: This proof uses martingales and is due to Storey, Taylor and Siegmund in a paper published in 2004. The explanation above about an alternative formulation of the BH procedure implies that we only need to prove

\displaystyle \mathop{\mathbb E} \frac{V_{\tau}}{R_{\tau} \vee 1} = \frac{N_0 \alpha}{N}. \ \ \ \ \ (4)

where {V_t} and {R_t} are defined as in (2). The important observation now is that the process {\{V_t/t : 0 \leq t \leq 1\}} is a backward martingale i.e.,

\displaystyle \mathop{\mathbb E} \left( \frac{V_s}{s} \bigg| \frac{V_{t'}}{t'} , t' \geq t \right) = \frac{V_t}{t}

for all {0 \leq s < t \leq 1}. This fact involves only independent uniform random variables and is easy. With {\tau} defined as in (3), one of Doob’s martingale theorems gives

\displaystyle \mathop{\mathbb E} \left(\frac{V_{\tau}}{\tau} \right) = \mathop{\mathbb E} \left(\frac{V_1}{1} \right) = N_0.

Now the definition (3) of {\tau} implies that {N \tau/(R_{\tau} \vee 1) = \alpha} (this requires an argument!). As a result, we can replace {\tau} by {\alpha (R_{\tau} \vee 1)/N} to obtain (4). This completes the proof.

Proof Two: This proof works directly with the original formulation of the BH procedure. I have found this proof in a recent paper by Heesen and Janssen (see page 25 in arxiv:1410.8290). We may assume that {I_0} is nonempty for otherwise {V \equiv 0} and there will be nothing to prove. Let {p := (p_1, \dots, p_N)} and let {R(p)} denote the number of rejections made by the BH procedure. From the description, it should be clear that {R(p)} is exactly equal to {i_{\max}}. We can therefore write the FDP as

\displaystyle FDP = \frac{V}{R(p) \vee 1} = \sum_{j \in I_0} \frac{I \left\{p_j \leq \alpha R(p)/N \right\} }{R(p) \vee 1}.

We now fix {j \in I_0} and let {\tilde{p} := (p_1, \dots, p_{j-1}, 0, p_{j+1}, \dots, p_n)} i.e., the {j}th {p}-value is replaced by {0} and the rest of the {p}-values are unchanged. Let {R(\tilde{p})} denote the number of rejections of the BH procedure for {\tilde{p}}. It should be noted that {R(\tilde{p}) \geq 1} because of the presence of a zero {p}-value in {\tilde{p}}. The key observation now is

\displaystyle \frac{I \left\{p_j \leq \alpha R(p)/N \right\} }{R(p) \vee 1} = \frac{I \left\{p_j \leq \alpha R(\tilde{p})/N \right\} }{R(\tilde{p})} \ \ \ \ \ (5)

To see this, it is enough to note that {I \left\{p_j \leq \alpha R(p)/N \right\} = I \left\{p_j \leq \alpha R(\tilde{p})/N \right\}} and that {R(p) = R(\tilde{p})} when {p_j \leq \alpha R(p)/N}. It is straightforward to verify these facts from the definition of the BH procedure. Using (5), we can write

\displaystyle FDR = \sum_{j \in I_0} \mathop{\mathbb E} \frac{I \left\{p_j \leq \alpha R(p)/N \right\} }{R(p) \vee 1} = \sum_{j \in I_0} \mathop{\mathbb E} \frac{I \left\{p_j \leq \alpha R(\tilde{p})/N \right\} }{R(\tilde{p})}

The independence assumption of {p_1, \dots, p_N} now implies that {p_j} and {R(\tilde{p})} are independent. Also because {p_j} is uniformly distributed on {[0, 1]} as {j \in I_0}, we deduce that {FDR = \alpha N_0/N} and this completes the proof.