False discovery rate

From Wikipedia, the free encyclopedia

False discovery rate (FDR) control is a statistical method used in multiple hypothesis testing to correct for multiple comparisons. In a list of rejected hypotheses, FDR controls the expected proportion of incorrectly rejected null hypotheses (type I errors).[1] It is a less conservative procedure for comparison, with greater power than familywise error rate (FWER) control, at a cost of increasing the likelihood of obtaining type I errors.[2]

The q value is defined to be the FDR analogue of the p-value. The q-value of an individual hypothesis test is the minimum FDR at which the test may be called significant. One approach is to directly estimate q-values rather than fixing a level at which to control the FDR.

Contents

[edit] Classification of m hypothesis tests

The following table defines some random variables related to the m hypothesis tests.

# declared non-significant # declared significant Total
# true null hypotheses U V m0
# non-true null hypotheses T S mm0
Total mR R m

The false discovery rate is given by \mathrm{E}\!\left [\frac{V}{V+S}\right ] = \mathrm{E}\!\left [\frac{V}{R}\right ] and one wants to keep this value below a threshold α.

( \frac{V}{R} is defined to be 0 when R = 0)

[edit] Controlling procedures

[edit] Independent tests

The Simes procedure ensures that its expected value \mathrm{E}\!\left[ \frac{V}{V + S} \right]\, is less than a given α (Benjamini and Hochberg 1995). This procedure is valid when the m tests are independent. Let H_1 \ldots H_m be the null hypotheses and P_1 \ldots P_m their corresponding p-values. Order these values in increasing order and denote them by P_{(1)} \ldots P_{(m)}. For a given α, find the largest k such that P_{(k)} \leq \frac{k}{m} \alpha.

Then reject (i.e. declare positive) all H(i) for i = 1, \ldots, k. ...Note, the mean α for these m tests is \frac{\alpha(m+1)}{2m} which could be used as a rough FDR (RFDR) or "α adjusted for m indep. tests."

[edit] Dependent tests

The Benjamini and Yekutieli procedure controls the false discovery rate under dependence assumptions. This refinement modifies the threshold and finds the largest k such that:

Failed to parse (Cannot write to or create math output directory): P_{(k)} \leq \frac{k}{m \cdot c(m)} \alpha


  • If the tests are independent: c(m) = 1 (same as above)
  • If the tests are positively correlated: c(m) = 1
  • If the tests are negatively correlated: c(m) = \sum _{i=1} ^m \frac{1}{i}

In the case of negative correlation, c(m) can be approximated by using the Euler-Mascheroni constant

\sum _{i=1} ^m \frac{1}{i} \approx \ln(m) + \gamma.

Using RFDR above, an approximate FDR (AFDR) is the min(mean α) for m dependent tests = RFDR / ( ln(m)+ 0.57721...).

[edit] References

  1. ^ Benjamini, Y., and Hochberg Y. (1995). "Controlling the false discovery rate: a practical and powerful approach to multiple testing". Journal of the Royal Statistical Society. Series B (Methodological) 57 (1), 289–300. School of Mathematical Sciences
  2. ^ Shaffer J.P. (1995) Multiple hypothesis testing, Annual Rview of Psychology 46:561-584, Annual Reviews

[edit] External links