加载中…
个人资料
  • 博客等级:
  • 博客积分:
  • 博客访问:
  • 关注人气:
  • 获赠金笔:0支
  • 赠出金笔:0支
  • 荣誉徽章:
正文 字体大小:

ANNOVAR人类各个数据库变异注释结果表格说明

(2018-12-05 09:17:03)
标签:

annovar

snpeff

分类: 生物转载
转自:http://www.omicsclass.com/article/464

  • ANNOVAR注释结果中各列的表头说明:
ID 详解
Chr 染色体
Start 变异位点在染色体上的起始位置
End 变异位点在染色体上的结束位置
Ref 参考基因组碱基型
Alt 变异碱基型
Func.refGene 对变异位点所在的区域进行注释(exonic, splicing, UTR5, UTR3, intronic, ncRNA_exonic, ncRNA_intronic, ncRNA_UTR3, ncRNA_UTR5, ncRNA _splicing, upstream, downstream, intergenic)
Gene.refGene 列出该变异位点相关的转录本(只有功能符合 Func 列的转录本才列出)。如果 Func 为intergenic,此处列出两侧的基因名
GeneDetail.refGene 描述 UTR、splicing、ncRNA_splicing 或 intergenic 区域的变异情况。当 Func 列的值为exonic、ncRNA_exonic、intronic、ncRNA_intronic、upstream、downstream、upstream;downstream、ncRNA_UTR3、ncRNA_UTR5 时,该列为空;当 Func 列的值为 intergenic 时,该列格式为dist=1366;dist=22344,表示该变异位点距离两侧基因的距离
ExonicFunc.refGene 外显子区的 SNV or InDel 变异类型(SNV 的变异类型包括 synonymous_SNV, missense_SNV, stopgain_SNV, stopgloss_SNV 和 unknown;Indel 的变异类型包括 frameshift insertion, frameshift deletion, stopgain, stoploss, nonframeshift insertion, nonframeshift deletion 和 unknown)
AAChange.refGene 氨基酸改变,只有当 Func 列为 exonic 或 exonic;splicing 时,该列才有结果。按照每个转录本进行注释(例如,NADK:NM_001198995:exon10:c.1240_1241insAGG:p.G414delinsEG,其中,NADK 表示该变异所在的基因名称,NM_001198995 表示该变异所在的转录本 ID,exon10 表示该变异位于转录本的第 10 个外显子上,c.1240_1241insAGG 表示该变异引起 cDNA 在第 1240 和 1241 位之间插入 AGG,p.G414delinsEG 表示该变异引起蛋白序列在第 414 位上的氨基酸由 Gly 变为 Gly-Glu。再如, FMN2:NM_020066:exon1:c.160_162del:p.54_54del,表示该变异引起 cDNA 的第 160 到 162 位发生删除,p.54_54del 表示该变异引起蛋白序列在第 54 位上的氨基酸删除)
cytoBand 该变异位点所处的染色体区段(利用 Giemas 染色观察得到的)
genomicSuperDups 基因组中的重复片段
nci60 NCI-60 human tumor cell line panel exome sequencing allele frequency data
esp6500siv2_all 国家心肺和血液研究所外显子组测序计划(NHLBI-ESP project,esp6500si_all 数据库中包含SNP 变异、Indel 变异和Y 染色体上的变异)的所有个体中,突变碱基的等位基因频率(alternative allele frequency)。 
ALL.sites.2015_08 给出千人基因组计划数据(2015 年 8 月公布的版本)的所有人群中,该变异位点上突变碱基的等位基因频率
EAS.sites.2015_08 给出千人基因组计划数据(2015 年 8 月公布的版本)的亚洲人群中,该变异位点上突变碱基的等位基因频率
SAS.sites.2015_08 给出千人基因组计划数据(2015 年 8 月公布的版本)的南亚洲人群中,该变异位点上突变碱基的等位基因频率
avsnp150 该变异在 dbSNP中的 ID
SIFT_score SIFT 分值,表示该变异对蛋白序列的影响,SIFT 分值越小越“有害”,表明该 SNP 导致蛋白结构或功能改变的可能性大;
SIFT_pred D: Deleterious (sift<=0.05); T: tolerated (sift>0.05))
Polyphen2_HDIV_score 利用 PolyPhen2 基于 HumanDiv 数据库预测该变异对蛋白序列的影响,用于复杂疾病,数值越大越“有害”,表明该 SNP 导致蛋白结构或功能改变的可能性大;damaging (0.453<=pp2_hdiv<=0.956); B: benign (pp2_hdiv<=0.452))
Polyphen2_HDIV_pred D 或 P 或 B(D: Probably damaging (>=0.957), P: possibly 
Polyphen2_HVAR_score 利用 PolyPhen2 基于 HumanVar 数据库预测该变异对蛋白序列的影响,用于单基因遗传病。数值越大越“有害”,表明该 SNP 导致蛋白结构或功能改变的可能性大;
Polyphen2_HVAR_pred D 或 P 或 B(D: Probably damaging (>=0.909), P: possibly damaging (0.447<=pp2_hvar<=0.909); B: benign (pp2_hvar<=0.446))
LRT_score LRT 分值,表示该变异对蛋白序列的影响,值越大越“有害”,表明该 SNP 导致蛋白结构或功能改变的可能性大。
LRT_pred D、N 或者 U(D: Deleterious; N: Neutral; U: Unknown)。
MutationTaster_score MutationTaster 分值,表示该变异对蛋白序列的影响,值越大越“有害”,表明该 SNP 导致蛋白结构或功能改变的可能性大。("polymorphism_automatic"
MutationTaster_pred A ("disease_causing_automatic"); "D" ("disease_causing");"N" ("polymorphism"); "P" (Polymorphism_automatic)
MutationAssessor_score MutationAssessor预测的致病得分
MutationAssessor_pred MutationAssessor根据阈值判断得到的预测分类:H为较高可信度的致病位点,M为中等可信的致病位点,L为低可信度的致病位点,N为无害位点
FATHMM_score FATHMM软件预测的致病性得分
FATHMM_pred FATHMM根据阈值得到的分类:D为较高可信度的致病位点,P为可信度一般的致病位点
RadialSVM_score higher score denoting more deleterious variants
RadialSVM_pred D: Deleterious; T: Tolerated
LR_score higher score denoting more deleterious variants
LR_pred D: Deleterious; T: Tolerated
VEST3_score Variant effect scoring tool;Random forest classifier, higher values are more deleterious
CADD_raw CADD raw score
CADD_phred CADD phred-like scorehigher values are more deleterious
GERP++_RS GREP++ "rejected substitutions" (RS) score,higher scores are more deleterious
phyloP46way_placental higher scores are more deleterious
phyloP100way_vertebrate higher scores are more deleterious
SiPhy_29way_logOdds higher scores are more deleterious
dgvMerged 人类结构变异注释结果:http://dgv.tcag.ca/dgv/app/home
phastConsElements100way 由 phastCons 程序基于脊椎动物全基因组比对预测得到的保守区域,100way 是指使用的物种数目为 100 个
omim_201806 孟德尔遗传病数据库注释
cosmic70 人类癌症体细胞突变影响的数据库,COSM开头为ID可到网站查询https://cancer.sanger.ac.uk/cosmic
CLNALLELEID the ClinVar Allele ID
CLNDN ClinVar's preferred disease name for the concept specified by disease identifiers in CLNDISDB
CLNDISDB Tag-value pairs of disease database name and identifier, e.g. OMIM:NNNNNN
CLNREVSTAT ClinVar review status for the Variation ID
CLNSIG Clinical significance for this single variant
gwasCatalog 检测变异位点是否在以往的 GWAS 研究中被报导,表示该变异位点与哪些疾病相关联,“.”表示没有 GWAS 报导
HGMD HGMD注释结果
Allele_frequency 样品变异碱基的等位基因频率
QUAL 变异的质量值
FORMAT 通常为:GT:AD:DP:GQ:PL,标记样品列属性
sample 样品信息列详情见:http://www.omicsclass.com/article/6



当然关于人类的变异信息ANNOVAR注释的数据库很多,这里只列举了部分内容,下面是网上摘录了一个信息:https://brb.nci.nih.gov/seqtools/colexpanno.html

We provide here detailed Description about the files outputted from the  mutation annotators via ANNOVAR and SnpEff.

Chr Chromosome number
Start Start position
End End position
Ref Reference base(s)
Alt Alternate non-reference alleles called on at least one of the samples
COSMIC ID COSMIC ID
Func.refGene Regions (e.g., exonic, intronic, non-coding RNA)) that one variant hits; please click here for details.
Gene.refGene Gene name associated with one variant
ExonicFunc.refGene Exonic variant function, e.g., nonsynonymous, synonymous, frameshift insertion.please click here for details.
AAChange.refGene Amino acid change. For example, SAMD11:NM_152486:exon10:c.T1027C:p.W343R stands for gene name, Known RefSeq accession, region, cDNA level change, protein level change.
SIFT_score SIFT score. See the dbNSFP information table for details.
SIFT_pred SIFT prediction. See the dbNSFP information table for details.
Polyphen2_HDIV_score Pholyphen2 score based on HDIV. See the dbNSFP information table for details.
Polyphen2_HDIV_pred Pholyphen2 prediction based on HDIV. See the dbNSFP information tablefor details.
Polyphen2_HVAR_score Polyphen2 score based on HVAR. See the dbNSFP information table for details.
Polyphen2_HVAR_pred Polyphen2 prediction based on HVAR. See the dbNSFP information tablefor details.
LRT_score LRT score. See the dbNSFP information table for details.
LRT_pred LRT prediction. See the dbNSFP information table for details.
MutationTaster_score MutationTaster score. See the dbNSFP information table for details.
MutationTaster_pred MutationTaster prediction. See the dbNSFP information table for details.
MutationAssessor_score MutationTaster score. See the dbNSFP information table for details.
MutationAssessor_pred MutationTaster prediction. See the dbNSFP information table for details.
FATHMM_score FATHMM score. See the dbNSFP information table for details.
FATHMM_pred FATHMM prediction. See the dbNSFP information table for details.
PROVEAN_score PROVEAN score<. See the dbNSFP information table for details./td>
PROVEAN_pred PROVEAN prediction. See the dbNSFP information table for details.
VEST3_score VEST V3 score. See the dbNSFP information table for details.
CADD_raw CADD raw score. See the dbNSFP information table for details.
CADD_phred CADD phred-like score. See the dbNSFP information table for details.
DANN_score DANN score. See the dbNSFP information table for details.
fathmm-MKL_coding_score fathmm-MKL score for one coding variant. See the dbNSFP information table for details.
fathmm-MKL_coding_pred fathmm-MKL prediction for one coding variant. See the dbNSFP information table for details.
MetaSVM_score MetaSVM score. See the dbNSFP information table for details.
MetaSVM_pred MetaSVM prediction. See the dbNSFP information table for details.
MetaLR_score MetaLR score. See the dbNSFP information table for details.
MetaLR_pred MetaLR prediction. See the dbNSFP information table for details.
integrated_fitCons_score fitCons score<. See the dbNSFP information table for details./td>
integrated_confidence_value confidence level. See the dbNSFP information table for details.
GERP++_RS GREP++ "rejected substitutions" (RS) score. See the dbNSFP information table for details.
phyloP7way_vertebrate Phylogenetic p-values for 7 vertebrate species. See the dbNSFP information table for details.
phyloP20way_mammalian Phylogenetic p-values for 20 mammalian species. See the dbNSFP information table for details.
phastCons7way_vertebrate PhastCons score for 7 vertebrate species. See the dbNSFP information table for details.
phastCons20way_mammalian phastCons p-values for 20 mammalian species. See the dbNSFP information table for details.
SiPhy_29way_logOdds SiPhy log odds score for 29 species. See the dbNSFP information tablefor details.
  • SnpEff 注释结果各表头说明
CHROM Chromosome number
POS Position
ID semi-colon separated list of unique identifiers where available. If this is a dbSNP variant it is encouraged to use the rs number(s).
REF Reference base(s)
ALT Alternate non-reference alleles called on at least one of the samples
EFFECT Functional consequences of one variant, e.g., missense_variant, synonymous_variant. please click here for details.
REGION Regions (e.g., exonic, intronic) that one variant hits
IMPACT Putative impact of the variant (e.g. HIGH, MODERATE or LOW impact).
GENE Gene name (usually HUGO)
GENEID Gene ID)
FEATURE The type of feature is in the next field (e.g. transcript, motif, miRNA, etc.)
FEATUREID Transcript ID (preferably using version number), Motif ID, miRNA, ChipSeq peak, Histone mark, depending on the annotation.
BIOTYPE Description on whether the transcript is {“Coding”, “Noncoding”}. Whenever possible, use ENSEMBL biotypes. .
HGVS_C Variant using HGVS notation (DNA level). For example, c.352A>G stands for A to G substitution of nucleotide 352. Click here for details.
HGVS_P Coding variant using HGVS notation (Protein level). For example, p.Ile118Val stands for Isoleucine at position number 66 substitution to Valine. p.Ile118Val can be also be represented by p.I118V using the 1-letter symbol here. Click here for details.
SIFT_score SIFT score. See the dbNSFP information table for details.
SIFT_pred SIFT prediction. See the dbNSFP information table for details.
Polyphen2_HDIV_score Pholyphen2 score based on HDIV. See the dbNSFP information table for details.
Polyphen2_HDIV_pred Pholyphen2 prediction based on HDIV. See the dbNSFP information tablefor details.
Polyphen2_HVAR_score Polyphen2 score based on HVAR. See the dbNSFP information table for details.
Polyphen2_HVAR_pred Polyphen2 prediction based on HVAR. See the dbNSFP information tablefor details.
LRT_score LRT score. See the dbNSFP information table for details.
LRT_pred LRT prediction. See the dbNSFP information table for details.
MutationTaster_score MutationTaster score. See the dbNSFP information table for details.
MutationTaster_pred MutationTaster prediction. See the dbNSFP information table for details.
MutationAssessor_score MutationAssessor score. See the dbNSFP information table for details.
MutationAssessor_pred MutationAssessor prediction. See the dbNSFP information table for details.
FATHMM_score FATHMM score. See the dbNSFP information table for details.
FATHMM_pred FATHMM prediction. See the dbNSFP information table for details.
PROVEAN_score PROVEAN score<. See the dbNSFP information table for details./td>
PROVEAN_pred PROVEAN prediction. See the dbNSFP information table for details.
VEST3_score VEST V3 score. See the dbNSFP information table for details.
CADD_raw CADD raw score. See the dbNSFP information table for details.
CADD_phred CADD phred-like score. See the dbNSFP information table for details.
MetaSVM_score MetaSVM score. See the dbNSFP information table for details.
MetaSVM_pred MetaSVM prediction. See the dbNSFP information table for details.
MetaLR_score MetaLR score. See the dbNSFP information table for details.
MetaLR_pred MetaLR prediction. See the dbNSFP information table for details.
GERP++_NR GREP++ conservation score. See the dbNSFP information table for details.
GERP++_RS GREP++ "rejected substitutions" (RS) score. See the dbNSFP information table for details.
phyloP100way_vertebrate Phylogenetic p-values for 100 vertebrate species. See the dbNSFP information table for details.
phastCons100way_vertebrate PhastCons score for 7 vertebrate species. See the dbNSFP information table for details.
SiPhy_29way_logOdds SiPhy log odds score for 29 species. See the dbNSFP information tablefor details.

  • 详细说明 Information
  • SIFT_pred 
    SIFT_score
    SIFT Sort intolerated from tolerated P(An amino acid at a position is tolerated | The most frequentest amino acid being tolerated) D: Deleterious (sift<=0.05);
    T: tolerated (sift>0.05)
    Pauline Ng, Fred Hutchinson 
    Cancer Research Center, Seattle, Washington
    Polyphen2_HDIV_pred 
    Polyphen2_HDIV_score
    Polyphen v2 Polymorphism phenotyping v2 D: Probably damaging (>=0.957), 
    P: possibly damaging (0.453<=pp2_hdiv<=0.956), 
    B: benign (pp2_hdiv<=0.452)
    Probablistic Classifier Training sets: HumDiv Havard Medical School/td>
    Polyphen2_HVAR_pred
    Polyphen2_HVAR_score
    Polyphen v2 Polymorphism phenotyping v2 Machine learning Training sets: HumVar D: Probably damaging (>=0.957), 
    P: possibly damaging (0.453<=pp2_hdiv<=0.956); 
    B: benign (pp2_hdiv<=0.452)
    Shamil Sunyaev
    Havard Medical School
    LRT_pred 
    LRT_score
    LRT Likelihood ratio test LRT of H0: each codon evolves neutrally vs H1: the codon evovles under negative selection D: Deleterious; 
    N: Neutral;
    U: Unknown
    Lower scores are more deleterious
    Sung Chung, Justin Fay Washington University
    MutationTaster_pred 
    MutationTaster_score
    MutationTaster Bayes Classifier A: (""disease_causing_automatic""); 
    D: (""disease_causing""); 
    N: (""polymorphism [probably harmless]""); 
    P: (""polymorphism_automatic[known to be harmless]"
    higher values are more deleterious"
    Markus Schuelke
    the Charité - Universitätsmedizin Berlin
    MutationAssessor_pred 
    MutationAssessor_score
    MutationAssessor Entropy of multiple sequence alighnment H: high; 
    M: medium; 
    L: low; 
    N: neutral. 
    H/M means functional and L/N means non-functional higher values are more deleterious
    Reva Boris
    Computation Biology Center Memorial Sloan Kettering Cancer Center
    FATHMM_pred 
    FATHMM_score
    FATHMM HMM Functional analysis through hidden markov model HMM D: Deleterious; 
    T: Tolerated;
    lower values are more deleterious
    Shihab Hashem
    University of Bristol, UK
    PROVEAN_pred 
    PROVEAN_score
    Protein Variation Effect Analyzer Clustering of homologus sequences D: Deleterious; 
    N: Neutral
    higher values are more deleterious
    Choi Y J. Craig Venter Institute
    VEST3_score VEST V3 Variant effect scoring tool Random forest classifier higher values are more deleterious Rachel Karchin John Hopkins University
    CADD_raw CADD_phred CADD Combined annotation dependent depletion Linear kernel SVM higher values are more deleterious Jay Shendure, Xiaohui Xie University of California - Irvine
    DANN_score DANN Deleterious Annotation of genetic variants using Neural Networks Neural network higher values are more deleterious Jay Shendure, Xiaohui Xie
    University of California - Irvine
    fathmm-MKL_coding_pred FATHMM-MKL predicting the effects of both coding and non-coding variants using nucleotide-based HMMs Classifier based on multiple kernel learning D: Deleterious; 
    T: Tolerated
    Score >= 0.5: D; 
    Score < 0.5: T
    Shihab Hashem
    University of Bristol, UK
    MetaSVM_pred 
    MetaSVM_score
    MetaSVM Support vector machine D: Deleterious; T: Tolerated;
    higher scores are more deleterious
    Coco Dong
    USC Biostatiscs Department
    MetaLR_pred 
    MetaLR_score
    MetaLR Logistic regression D: Deleterious; 
    T: Tolerated; 
    higher scores are more deleterious
    Coco Dong 
    USC Biostatiscs Department
    integrated_fitCons_score 
    integrated_confidence_value
    FitCons Fitness consequences of functional annotation Integrate functional assays like ChIP-Seq with conservation measure of transcription factor binding sites higher scores are more deleterious Abriza
    Cold Spring Harbor Lab
    GERP++_RS
    GERP++_NR
    Genome Evolutionary Rate Profiling ++ maximum likelihood estimation procedure higher scores are more deleterious Eugne Davydov
    Stanford University, CS Department
    phyloP7way_vertebrate PhyloP Phylogentic p-values Phylogentic p-values calculated from a LRT, score-based test, GERP test Use 7 species higher scores are more deleterious Adam Siepel 
    UCSC
    phyloP20way_mammalian PhyloP Phylogentic p-values a phylogenetic hidden Markov model (phylo-HMM) Use 20 species higher scores are more deleterious Adam Siepel
    UCSC
    phastCons7way_vertebrate phastCons A phylogenetic hidden Markov model (phylo-HMM) Use 7 species higher scores are more deleterious Adam Siepel
    UCSC
    phastCons20way_mammalian phastCons a phylogenetic hidden Markov model (phylo-HMM) Use 20 species higher scores are more deleterious Adam Siepel
    UCSC
    SiPhy_29_way SiPhy Probablistic framework, HMM Use 29 species higher scores are more deleterious Manual Garber
    Broad Institute of MIT & Harvard

0

阅读 收藏 喜欢 打印举报/Report
  

新浪BLOG意见反馈留言板 欢迎批评指正

新浪简介 | About Sina | 广告服务 | 联系我们 | 招聘信息 | 网站律师 | SINA English | 产品答疑

新浪公司 版权所有