加载中…
个人资料
邴钝
邴钝
  • 博客等级:
  • 博客积分:0
  • 博客访问:6,367
  • 关注人气:3
  • 获赠金笔:0支
  • 赠出金笔:0支
  • 荣誉徽章:
相关博文
推荐博文
谁看过这篇博文
加载中…
正文 字体大小:

[转载]使用tophat和cufflinks进行RNA-SEQ的表达量的分析

(2013-11-13 21:43:28)
标签:

转载

分类: linux

Discovering novel genes and transcripts


RNA-Seq is a powerful technology for gene and splice variant discovery. You can use Cufflinks to help annotate a new genome or find new genes and splice isoforms of known genes in even well-annotated genomes. Annotating genomes is a complex and difficult process, but we outline a basic workflow that should get you started here. The workflow also excludes examples of the commands you'd run to implement each step in the workflow. Suppose we have RNA-Seq reads from human liver, brain, and heart.


  1. Map the reads for each tissue to the reference genome
  2. We recommend that you use TopHat to map your reads to the reference genome. For this example, we'll assume you have paired-end RNA-Seq data. You can map reads as follows:

    tophat -r 50 -o tophat_brain /seqdata/indexes/hg19 brain_1.fq brain_2.fq tophat -r 50 -o tophat_liver /seqdata/indexes/hg19 liver_1.fq liver_2.fq tophat -r 50 -o tophat_heart /seqdata/indexes/hg19 heart_1.fq heart_2.fq

    The commands above are just examples of how to map reads with TopHat. Please see the TopHat manual for more details on RNA-Seq read mapping.


  3. Run Cufflinks on each mapping file
  4. The next step is to assemble each tissue sample independently using Cufflinks. Assemble each tissue like so:

    cufflinks -o cufflinks_brain tophat_brain/accepted_hits.bam
    cufflinks -o cufflinks_liver tophat_liver/accepted_hits.bam
    cufflinks -o cufflinks_heart tophat_liver/accepted_hits.bam
  5. Merge the resulting assemblies
  6. assemblies.txt:
    cufflinks_brain/transcripts.gtf
    cufflinks_liver/transcripts.gtf
    cufflinks_heart/transcripts.gtf
    Now run the merge script:
    cuffmerge -s /seqdata/fastafiles/hg19/hg19.fa assemblies.txt

    The final, merged annotation will be in the file merged_asm/merged.gtf. At this point, you can use your favorite browser to explore the structure of your genes, or feed this file into downstream informatic analyses, such as a search for orthologs in other organisms. You can also explore your samples with Cuffdiff and identify genes that are significantly differentially expressed between the three conditions. See the workflows below for more details on how to do this.

  7. (optional) Compare the merged assembly with known or annotated genes
  8. If you want to discover new genes in a genome that has been annotated, you can use cuffcompare to sort out what is new in your assembly from what is already known. Run cuffcompare like this:
    cuffcompare -s /seqdata/fastafiles/hg19/hg19.fa -r known_annotation.gtf merged_asm/merged.gtf
    Cuffcompare will produce a number of output files that you can parse to select novel genes and isoforms.

Identifying differentially expressed and regulated genes


There are two workflows you can choose from when looking for differentially expressed and regulated genes using the Cufflinks package. The first workflow is simpler and is a good choice when you aren't looking for novel genes and transcripts. This workflow requires that you not only have a reference genome, but also a reference gene annotation in GFF format (GFF3 or GTF2 formats are accepted, see details here). The second workflow, which includes steps to discover new genes and new splice variants of known genes, is more complex and requires more computing power. The second workflow can use and augment a reference gene annotation GFF if one is available.


Differential analysis without gene and transcript discovery
  1. Map the reads for each condition to the reference genome
  2. We recommend that you use TopHat to map your reads to the reference genome. For this example, we'll assume you have paired-end RNA-Seq data. Suppose you have RNA-Seq from a knockdown experiment where you have two biological replicates of a mock condition as a control and two replicates of your knockdown.

    Note: Cuffdiff will work much better if you map your replicates independently, rather than pooling the replicates from one condition into a single set of reads.

    Note: While an GTF of known transcripts is not strictly required at this stage, providing one will improve alignment sensitivity, and ultimately, the accuracy of Cuffdiff's analysis.

    You can map reads as follows:

    tophat -r 50 -G annotation.gtf -o tophat_mock_rep1 /seqdata/indexes/hg19
      mock_rep1_1.fq mock_rep1_2.fq
    tophat -r 50 -G annotation.gtf -o tophat_mock_rep2 /seqdata/indexes/hg19
      mock_rep2_1.fq mock_rep2_2.fq
    tophat -r 50 -G annotation.gtf -o tophat_knockdown_rep1 /seqdata/indexes/hg19
      knockdown_rep1_1.fq knockdown_rep1_2.fq
    tophat -r 50 -G annotation.gtf -o tophat_knockdown_rep2 /seqdata/indexes/hg19
      knockdown_rep2_1.fq knockdown_rep2_2.fq
  3. Run Cuffdiff
  4. Take the annotated transcripts for your genome (as GFF or GTF) and provide them to cuffdiff along with the BAM files from TopHat for each replicate:
    cuffdiff annotation.gtf mock_rep1.bam,mock_rep2.bam
       knockdown_rep1.bam,knockdown_rep2.bam

Differential analysis with gene and transcript discovery
  1. Complete steps 1-3 in "Discovering novel genes and transcripts", above
  2. Follow the protocol for gene and transcript discovery listed above. Be sure to provide TopHat and the assembly merging script with an reference annotation if one is available for your organism, to ensure the highest possible quality of differential expression analysis.

  3. Run Cuffdiff
  4. Take the merged assembly from produced in step 3 of the discovery protocol and provide it to cuffdiff along with the BAM files from TopHat:
    cuffdiff merged_asm/merged.gtf liver1.bam,liver2.bam brain1.bam,brain2.bam
    As shown above, replicate BAM files for each conditions must be given as a comma separated list. If you put spaces between replicate files instead of commas, cuffdiff will treat them as independent conditions.

0

  • 评论加载中,请稍候...
发评论

    发评论

    以上网友发言只代表其个人观点,不代表新浪网的观点或立场。

      

    新浪BLOG意见反馈留言板 欢迎批评指正

    新浪简介 | About Sina | 广告服务 | 联系我们 | 招聘信息 | 网站律师 | SINA English | 会员注册 | 产品答疑

    新浪公司 版权所有