加载中…
个人资料
  • 博客等级:
  • 博客积分:
  • 博客访问:
  • 关注人气:
  • 获赠金笔:0支
  • 赠出金笔:0支
  • 荣誉徽章:
正文 字体大小:

Salmon软件定量RNAseq

(2017-08-30 12:34:57)

Using Salmon

1.建立索引(transcripts.fa
 salmon index 
> ./bin/salmon index -t transcripts.fa -i transcripts_index --type quasi -k 31

2.参考转录组FASTA和比对结果文件  .sam or .bam 或fq/FASTA原始文件

> ./bin/salmon quant -i transcripts_index -l <</span>LIBTYPE> -1 reads1.fq -2 reads2.fq -o transcripts_quant
> ./bin/salmon quant -t transcripts.fa -l -a aln.bam -o salmon_quant

参数<</span>LIBTYPE>
 -l A or --libType A 自动检测数据库类
 three parts: the relative orientation of the reads, the strandedness of the library, and the directionality of the reads.
(relative orientation) is only provided if the library is paired-end. The possible options are:
I = inward
O = outward
M = matching
stranded or unstranded; the options are:
S = stranded
U = unstranded
If the protocol is unstranded, then we’re done. The final part of the library string specifies the strand from which the read originates in a strand-specific protocol — it is only provided if the library is stranded (i.e. if the library format string is of the form S). The possible values are:
F = read 1 (or single-end read) comes from the forward strand
R = read 1 (or single-end read) comes from the reverse strand
An example of some library format strings and their interpretations are:
IU (an unstranded paired-end library where the reads face each other)
SF (a stranded single-end protocol where the reads come from the forward strand)
OSR (a stranded paired-end protocol where the reads face away from each other,
read1 comes from reverse strand and read2 comes from the forward strand)



3.输出文件

3.1 主要定量文件 quant.sf 
内容
Salmon软件定量RNAseq
Each subsequent row describes a single quantification record. The columns have the following interpretation.
  • Name — This is the name of the target transcript provided in the input transcript database (FASTA file).
  • Length — This is the length of the target transcript in nucleotides.
  • EffectiveLength — This is the computed effective length of the target transcript. It takes into account all factors being modeled that will effect the probability of sampling fragments from this transcript, including the fragment length distribution and sequence-specific and gc-fragment bias (if they are being modeled).
  • TPM — This is salmon’s estimate of the relative abundance of this transcript in units of Transcripts Per Million (TPM). TPM is the recommended relative abundance measure to use for downstream analysis.
  • NumReads — This is salmon’s estimate of the number of reads mapping to each transcript that was quantified. It is an “estimate” insofar as it is the expected number of reads that have originated from each transcript given the structure of the uniquely mapping and multi-mapping reads and the relative abundance estimates for each transcript.
3.2 记录命令文件cmd_info.json.
Salmon软件定量RNAseq
3.3  aux_info目录ambig_info.tsv. 记录每个转录本的uniq比对和模糊比对
Salmon软件定量RNAseq

0

阅读 收藏 喜欢 打印举报/Report
  

新浪BLOG意见反馈留言板 欢迎批评指正

新浪简介 | About Sina | 广告服务 | 联系我们 | 招聘信息 | 网站律师 | SINA English | 产品答疑

新浪公司 版权所有