Salmon软件定量RNAseq_生信届小学生

http://blog.sina.com.cn/u/6179256122

首页博文目录关于我

个人资料

微博

加好友发纸条

写留言加关注

博客等级：
博客积分：

博客访问：
关注人气：
获赠金笔：0支
赠出金笔：0支
荣誉徽章：

正文字体大小：大中小

Salmon软件定量RNAseq

(2017-08-30 12:34:57)

Using Salmon

1.建立索引（transcripts.fa）

salmon index

> ./bin/salmon index -t transcripts.fa -i transcripts_index --type quasi -k 31

2.参考转录组FASTA和比对结果文件 .sam or .bam 或fq/FASTA原始文件

> ./bin/salmon quant -i transcripts_index -l <</span>LIBTYPE> -1 reads1.fq -2 reads2.fq -o transcripts_quant

或

> ./bin/salmon quant -t transcripts.fa -l -a aln.bam -o salmon_quant

参数<</span>LIBTYPE>

-l A or --libType A 自动检测数据库类

three parts: the relative orientation of the reads, the strandedness of the library, and the directionality of the reads.

(relative orientation) is only provided if the library is paired-end. The possible options are:

I = inward

O = outward

M = matching

stranded or unstranded; the options are:

S = stranded

U = unstranded

If the protocol is unstranded, then we’re done. The final part of the library string specifies the strand from which the read originates in a strand-specific protocol — it is only provided if the library is stranded (i.e. if the library format string is of the form S). The possible values are:

F = read 1 (or single-end read) comes from the forward strand

R = read 1 (or single-end read) comes from the reverse strand

An example of some library format strings and their interpretations are:

IU (an unstranded paired-end library where the reads face each other)

SF (a stranded single-end protocol where the reads come from the forward strand)

OSR (a stranded paired-end protocol where the reads face away from each other,

read1 comes from reverse strand and read2 comes from the forward strand)

3.输出文件

3.1 主要定量文件 quant.sf

内容

Each subsequent row describes a single quantification record. The columns have the following interpretation.

Name — This is the name of the target transcript provided in the input transcript database (FASTA file).
Length — This is the length of the target transcript in nucleotides.
EffectiveLength — This is the computed effective length of the target transcript. It takes into account all factors being modeled that will effect the probability of sampling fragments from this transcript, including the fragment length distribution and sequence-specific and gc-fragment bias (if they are being modeled).
TPM — This is salmon’s estimate of the relative abundance of this transcript in units of Transcripts Per Million (TPM). TPM is the recommended relative abundance measure to use for downstream analysis.
NumReads — This is salmon’s estimate of the number of reads mapping to each transcript that was quantified. It is an “estimate” insofar as it is the expected number of reads that have originated from each transcript given the structure of the uniquely mapping and multi-mapping reads and the relative abundance estimates for each transcript.

3.2 记录命令文件cmd_info.json.

3.3 aux_info目录中ambig_info.tsv. 记录每个转录本的uniq比对和模糊比对

阅读┊ 收藏 ┊ 喜欢 ▼ ┊打印┊举报/Report

前一篇：Kallisto软件定量RNAseq

后一篇：seqtk软件处理短序列

新浪BLOG意见反馈留言板　欢迎批评指正