加载中…
个人资料
统计遗传学
统计遗传学
  • 博客等级:
  • 博客积分:0
  • 博客访问:201,155
  • 关注人气:107
  • 获赠金笔:0支
  • 赠出金笔:0支
  • 荣誉徽章:
相关博文
推荐博文
谁看过这篇博文
加载中…
正文 字体大小:

多重检验中p-value的校正

(2014-12-03 09:44:18)

Multiple testing corrections adjust p-values derived from multiple statistical tests to
correct for occurrence of false positives. In microarray data analysis, false positives
are genes that are found to be statistically different between conditions, but are not in
reality.

 

 

方法:

 多重检验中p-value的校正

 

A. Bonferroni correction


The p-value of each gene is multiplied by the number of genes in the gene list. If the
corrected p-value is still below the error rate, the gene will be significant:
Corrected P-value= p-value * n (number of genes in test) <0.05
As a consequence, if testing 1000 genes at a time, the highest accepted individual pvalue
is 0.00005, making the correction very stringent. With a Family-wise error rate
of 0.05 (i.e., the probability of at least one error in the family), the expected number
of false positives will be 0.05.

 

B. Bonferroni Step-down (Holm) correction


This correction is very similar to the Bonferroni, but a little less stringent:
1) The p-value of each gene is ranked from the smallest to the largest.
2) The first p-value is multiplied by the number of genes present in the gene list:
if the end value is less than 0.05, the gene is significant:
Corrected P-value= p-value * n < 0.05
3) The second p-value is multiplied by the number of genes less 1:
Corrected P-value= p-value * n-1 < 0.05
4) The third p-value is multiplied by the number of genes less 2:
Corrected P-value= p-value * n-2 < 0.05
It follows that sequence until no gene is found to be significant.
Example:
Let n=1000, error rate=0.05
Gene
name
p-value before
correction
Rank Correction Is gene significant
after correction?
A 0.00002 1 0.00002 * 1000=0.02 0.02<0.05 => Yes
B 0.00004 2 0.00004*999=0.039 0.039<0.05 => Yes
C 0.00009 3 0.00009*998=0.0898 0.0898>0.05 => No
Because it is a little less corrective as the p-value increases, this correction is less
conservative. However the Family-wise error rate is very similar to the Bonferroni
correction (see table in section IV).

 

C. Westfall and Young Permutation


Both Bonferroni and Holm methods are called single-step procedures, where each pvalue
is corrected independently. The Westfall and Young permutation method takes
advantage of the dependence structure between genes, by permuting all the genes
at the same time.
The Westfall and Young permutation follows a step-down procedure similar to the
Holm method, combined with a bootstrapping method to compute the p-value
distribution:
1) P-values are calculated for each gene based on the original data set and
ranked.
2) The permutation method creates a pseudo-data set by dividing the data into
artificial treatment and control groups.
3) P-values for all genes are computed on the pseudo-data set.
4) The successive minima of the new p-values are retained and compared to
the original ones.
5) This process is repeated a large number of times, and the proportion of
resampled data sets where the minimum pseudo-p-value is less than the
original p-value is the adjusted p-value.
Because of the permutations, the method is very slow. The Westfall and Young
permutation method has a similar Family-wise error rate as the Bonferroni and Holm
corrections.

D. Benjamini and Hochberg False Discovery Rate


This correction is the least stringent of all 4 options, and therefore tolerates more
false positives. There will be also less false negative genes. Here is how it works:
1) The p-values of each gene are ranked from the smallest to the largest.
2) The largest p-value remains as it is.
3) The second largest p-value is multiplied by the total number of genes in gene
list divided by its rank. If less than 0.05, it is significant.
Corrected p-value = p-value*(n/n-1) < 0.05, if so, gene is significant.
4) The third p-value is multiplied as in step 3:
Corrected p-value = p-value*(n/n-2) < 0.05, if so, gene is significant.
And so on.

 

 

 

见:http://yixf.name/2011/01/11/【文献推荐】多重假设检验中的p值校正/

 

http://fhqdddddd.blog.163.com/blog/static/18699154201093171158444/

 

http://en.wikipedia.org/wiki/Multiple_comparisons

 

http://www.silicongenetics.com/Support/GeneSpring/GSnotes/analysis_guides/mtc.pdf

 

目前这些校正方法用于gene ontology的enrichment analysis

0

阅读 评论 收藏 转载 喜欢 打印举报/Report
  • 评论加载中,请稍候...
发评论

    发评论

    以上网友发言只代表其个人观点,不代表新浪网的观点或立场。

      

    新浪BLOG意见反馈留言板 电话:4000520066 提示音后按1键(按当地市话标准计费) 欢迎批评指正

    新浪简介 | About Sina | 广告服务 | 联系我们 | 招聘信息 | 网站律师 | SINA English | 会员注册 | 产品答疑

    新浪公司 版权所有