Salmon软件定量RNAseq

Using Salmon
1.建立索引(transcripts.fa)
> ./bin/salmon
index
-t
transcripts.fa
-i
transcripts_index
--type
quasi
-k
31
2.参考转录组FASTA和比对结果文件
.sam or .bam 或fq/FASTA原始文件
>
./bin/salmon
quant
-i
transcripts_index
-l
<</span>LIBTYPE>
-1
reads1.fq
-2
reads2.fq
-o
transcripts_quant
或
>
./bin/salmon quant -t transcripts.fa -l -a aln.bam -o
salmon_quant
参数<</span>LIBTYPE>
(relative orientation) is only provided if the library is
paired-end. The possible options are:
I = inward
O = outward
M = matching
stranded or unstranded; the options are:
S = stranded
U = unstranded
If the protocol is unstranded, then we’re done. The final part of
the library string specifies the strand from which the read
originates in a strand-specific protocol — it is only provided if
the library is stranded (i.e. if the library format string is of
the form S). The possible values are:
F = read 1 (or single-end read) comes from the forward strand
R = read 1 (or single-end read) comes from the reverse strand
An example of some library format strings and their interpretations
are:
IU (an unstranded paired-end library where the reads face each other)
SF (a stranded single-end protocol where the reads come from the forward strand)
OSR
(a
stranded
paired-end
protocol
where
the
reads
face
away
from
each
other,
read1
comes
from
reverse
strand
and
read2
comes
from
the
forward
strand)
3.输出文件
3.1 主要定量文件 quant.sf
内容
Each subsequent row describes a single quantification record. The
columns have the following interpretation.
- Name
— This is the name of the target transcript provided in the input transcript database (FASTA file). - Length
— This is the length of the target transcript in nucleotides. - EffectiveLength
— This is the computed effective length of the target transcript. It takes into account all factors being modeled that will effect the probability of sampling fragments from this transcript, including the fragment length distribution and sequence-specific and gc-fragment bias (if they are being modeled). - TPM
— This is salmon’s estimate of the relative abundance of this transcript in units of Transcripts Per Million (TPM). TPM is the recommended relative abundance measure to use for downstream analysis. - NumReads
— This is salmon’s estimate of the number of reads mapping to each transcript that was quantified. It is an “estimate” insofar as it is the expected number of reads that have originated from each transcript given the structure of the uniquely mapping and multi-mapping reads and the relative abundance estimates for each transcript.
3.2 记录命令文件 cmd_info.json.
3.3 aux_info目录中ambig_info.tsv. 记录每个转录本的uniq比对和模糊比对
后一篇:seqtk软件处理短序列