【Bowtie】DNA序列拼接的原理
【Jenny点评】 我一直以为Bowtie是一个短序列拼接工作,实际上这是错误的。它不是序列拼接工作,只是一个序列比对的工具。最后的结果是相对index而言,对各个短序列进行定位。
------------------
短序列比对的原理如何?目前有哪些常用的短序列比对软件? ok
http://blog.sina.com.cn/s/blog_9617895f01011npk.html
答:序列比对(alignment):为确定两个或多个序列之间的相似性以至于同源性,而将它们按照一定的规律排列。跟长序列比对不同,短序列比对有其特点,因此,两者的算法不一样。短序列比对中,一般常用的算法主要有三个:
(1)
空位种子片段索引法,如MAQ、ELAND等,首先将读段切分,并选取其中一段或几段作为种子建立搜索索引,再通过查找索引、延展匹配来实现读段定位,通过轮换种子考虑允许出现错配(mismatch)的各种可能的位置组合;
(2)
Burrows
Wheeler转换法,如Bowtie、BWA、SOAP2等,通过B-W转换将基因组序列按一定规则压缩并建立索引,再通过查找和回溯来定位读段,在查找时可通过碱基替代来实现允许的错配;
(3)
Smith-Waterman动态规划算法,如BFAST,SHRiMP等,利用初始条件和迭代关系式计算两个序列的所有可能的比对分值,并将结果存放于一个矩阵中,利用动态规划的方法回溯寻找最优的比对结果。
华大基因拼接 ok
http://www.ebiotrade.com/newsf/2010-1/2010128171022809.htm
下一代基因序列拼接算法研究
http://www.fdurop.fudan.edu.cn/upload/stu/docs/rcYsXb_102804-1303180458.pdf
基因组测序及分析 Good!推荐看!
http://ibi.zju.edu.cn/bioinplant/courses/chap4.pdf
基因序列拼接算法设计
http://www.doc88.com/p-741680604744.html
【Bowtie】Bowtie2使用方法与参数详细介绍
Bowtie2使用方法与参数详细介绍
懒人必看
Bowtie2 -q --phred33 --sensitive --end-to-end -I 0 -X 500 --fr --un
unpaired --al aligned \ --un-conc unconc --al-conc alconc -p 6
--reorder -x{-1-2| -U} -S []
用法:
bowtie2 [options]* -x {-1 -2 | -U } -S []
-x 由bowtie2-build所生成的索引文件的前缀。首先 在当前目录搜寻,然后 在环境变量 BOWTIE2_INDEXES
中制定的文件夹中搜寻。 -1 双末端测寻对应的文件1。可以为多个文件,并用逗号分开;多个文件必须和 -2
中制定的文件一一对应。比如:"-1 flyA_1.fq,flyB_1.fq -2 flyA_2.fq,flyB _2.fq".
测序文件中的reads的长度可以不一样。 -2 双末端测寻对应的文件2. -U
非双末端测寻对应的文件。可以为多个文件,并用逗号分开。测序文件中的reads的 长度可以不一样。 -S
所生成的SAM格式的文件前缀。默认是输入到标准输出。
以下是可选参数:
-q 输入的文件为FASTQ格式文件,此项为默认值。 -qseq 输入的文件为QSEQ格式文件。 -f
输入的文件为FASTA格式文件。选择此项时,表示--ignore-quals也被选择了。 -r
输入的文件中,每一行代表一条序列,没有序列名和测序质量等。选择此项时,表示-- ignore-quals也被选择了。 -c
后直接为比对的reads序列,而不是包含序列的文件名。序列间用逗号隔开。选择此项时, 表示—ignore-quals也被选择了。
-s/--skip input的reads中,跳过前个reads或者pairs。 -u/--qupto
只比对前个reads或者pairs(在跳过前个reads或者 pairs后)。Default: no limit.
-5/--trim5 剪掉5'端长度的碱基,再用于比对。(default: 0). -3/--trim3
剪掉3'端长度的碱基,再用于比对。(default: 0). --phred33 输入的碱基质量等于ASCII码值加上33.
在最近的illumina pipiline中 得以运用。 --phred64 输入的碱基质量等于ASCII码值加上64.
--solexa-quals 将Solexa的碱基质量转换为Phred。在老的GA Pipeline版本中得以 运用。Default:
off. --int-quals 输入文件中的碱基质量为用“ ”分隔的数值,而不是ASCII码。比如 40 40 30
40...。Default: off.
–end-to-end模式下的预设参数
--very-fast Same as: -D 5 -R 1 -N 0 -L 22 -i S,0,2.50 --fast Same
as: -D 10 -R 2 -N 0 -L 22 -i S,0,2.50 --sensitive Same as: -D 15 -R
2 -N 0 -L 22 -i S,1,1.15 (default in --end-to-end mode)
--very-sensitive Same as: -D 20 -R 3 -N 0 -L 20 -i
S,1,0.50
–loca模式下的预设参数
–loca模式下的预设参数 --very-fast-local Same as: -D 5 -R 1 -N 0 -L 25 -i
S,1,2.00 --fast-local Same as: -D 10 -R 2 -N 0 -L 22 -i S,1,1.75
--sensitive-local Same as: -D 15 -R 2 -N 0 -L 20 -i S,1,0.75
(default in --local mode) --very-sensitive-local Same as: -D 20 -R
3 -N 0 -L 20 -i S,1,0.50
-N 进行种子比对时允许的mismatch数. 可以设为0或者1. Default: 0. -L 设定种子的长度.
************************************************************ 功能选项
给bowtie的一些参数设定值的时候,使用一个计算公式代替,于是值的大小与比对序列的长 度成一定关系。有三部分组成: (a)计算方法,
包括常数(C),线性(L),平方根(S)和 自然对数(G); (b)一个常数; (c)一个系数. 例如: 为 L,-0.4,-0.6
则计算公式为: f(x) = -0.4 + -0.6 * x 为G,1,5.4 则计算公式为: f(x) = 1.0 + 5.4 *
ln(x) ************************************************************
-i 设定两个相邻种子间所间距的碱基数。
************************************************************
例如:如果read的长度为30, 种子的长度为10, 相邻种子的间距为6,则提取出的种子如下 所示: Read:
TAGCTACGCTCTACGCTATCATGCATAAAC Seed 1 fw: TAGCTACGCT Seed 1 rc:
AGCGTAGCTA Seed 2 fw: CGCTCTACGC Seed 2 rc: GCGTAGAGCG Seed 3 fw:
ACGCTATCAT Seed 3 rc: ATGATAGCGT Seed 4 fw: TCATGCATAA Seed 4 rc:
TTATGCATGA
************************************************************
在--end-to-end模式中默认值为”-i S,1,1.15”.即表示f(x) = 1 + 1.15 * sqrt(x).
如果read长度为100, 则相邻种子的间距为12. --n-ceil
设定read中允许含有不确定碱基(非GTAC,通常为N)的最大数目. Default: L,0,0.15. 计算公式为: f(x) =
0 + 0.15 * x, 表示长度为100的read 最多运行存在15个不确定碱基. 一旦不确定碱基数超过15,
则该条read会被过滤掉. --dpad Default: 15. --gbar 在read头尾个碱基内不允许gap.
Default: 4. --ignore-quals 计算错配罚分的时候不考虑碱基质量. 当输入序列的模式为-f, -r 或
者-c的时候, 该设置自动成为默认设置. --nofw/--norc –nofw设定read不和前导链(forward
reference strand)进行比对; --norc设定不和后随链(reverse-complement reference
strand)进行比对. Default: both strands enabled. --end-to-end
比对是将整个read和参考序列进行比对. 该模式--ma的值为0. 该模式为 默认模式, --local模式冲突. --local
该模式下对read进行局部比对, 从而, read两端的一些碱基不比对,从而使比 对得分满足要求. 该模式下
–ma默认为2.
--ma 设定匹配得分. --local模式下每个read上碱基和参考序列上碱基匹配, 则 加分.
在—end-to-end模式中无效. Default: 2. --mp MX,MN 设定错配罚分. 其中MX为所罚最高分,
MN为所罚最低分. 默认设置下罚分与 碱基质量相关. 罚分遵循的公式为: MN + floor( (MX-MN)(MIN(Q,
40.0)/40.0) ). 其中Q为碱基的质量值. 如果设置了—ignore-qual参数, 则错配总是罚最高分. Default:
MX = 6, MN = 2. --np 当匹配位点中read, reference上有不确定碱基(比如N)时所设定的罚分值.
Default: 1. --rdg , 设置在read上打开gap 罚分, 延长gap罚分. Default: 5, 3. --rfg
, 设置在reference上打开gap 罚分, 延长gap罚分 . Default: 5, 3. --score-min
设定成为有效比对的最小分值. 在—end-to-end模式下默认值为: L,-0.6,-0.6; 在--local模式下默认值为:
G,20,8.
-k 默认设置下, bowtie2搜索出了一个read不同的比对结果, 并报告其中最好的 比对结果(如果好几个最好的比对结果得分一致,
则随机挑选出其中一个). 而在该模式下, bowtie2最多搜索出一个read 个比对结果, 并将这些结果按得分降序报告出来. -a
和-k参数一样, 不过不限制搜索的结果数目. 并将所有的比对结果都按降序报告出来. 此参数和-k参数冲突. 值得注意的是:
如果基因组含有很多重复序列时, 该参数会导致程序 运行极其缓慢.
Effort 参数
-D 比对时, 将一个种子延长后得到比对结果, 如果不产生更好的或次好的比对结果, 则该次比对失败. 当失败次数连续达到次后,
则该条read比对结束. Bowtie2才会 继续进行下去. Default: 15. 当具有-k或-a参数,
则该参数所产生的限制会自动调整. -R 如果一个read所生成的种子在参考序列上匹配位点过多. 当每个种子平均匹配超 过300个位置,
则通过一个不同的偏移来重新生成种子进行比对. 则是重新生成种子 的次数. Default: 2.
Paired-end 参数
-I/--minins 设定最小的插入片段长度. Default: 0. -X/--maxins 设定最长的插入片段长度.
Default: 500. --fr/--rf/--ff 设定上下游reads和前导链paired-end比对的方向. --fr:
匹配时, read1在5'端上游, 和前导链一致, read2在3'下游, 和前导链反向互补. 或者read2在 上游,
read1在下游反向互补; --rf: read1在5'端上游, 和前导链反向互补, read2在 3'端下游, 和前导链一致;
--fr: 两条reads都和前导链一致. Default: --fr. 默认
设置适合于Illumina的paired-end测序数据; 若是mate-paired, 则要选择—rf参数. --no-mixed
默认设置下, 一对reads不能成对比对到参考序列上, 则单独对每个read进 行比对. 该选项则阻止此行为.
--no-discordant 默认设置下, 一对reads不能和谐比对(concordant alignment, 即满足-I,
-X, --fr/--rf/--ff的条件)到参考序列上, 则搜寻其不和谐比对(discon cordant alignment,
即两条reads都能独一无二地比对到参考序列上, 但是不满足-I, -X,--fr/--rf/--ff的条件). 该选项阻止此行为.
--dovetail read1和read2的关系为dovetail的时候,该状况算为和谐比对. 默认情况
下dovetail不算和谐比对. --no-contain read1和read2的关系为包含的时候, 该状况不算为和谐比对.
默认情况 下包含关系算为和谐比对. --no-overlap read1和read2的关系为有重叠的时候, 该状况不算为和谐比对.
默认情 况下两个reads重叠算为和谐比对.
-t/--time --un 将unpaired reads写入到. --un-gz 将unpaired reads写入到,
gzip压缩. --un-bz2 将unpaired reads写入到, bz2压缩. --al
将至少能比对1次以上的unpaired reads写入. --al-gz ... ,gzip压缩. --al-bz2 ...
,bz2压缩. --un-conc 将不能和谐比对的paired-end reads写入. --un-conc-gz ...
,gzip压缩. --un-conc-bz2 ... ,bz2压缩. --al-conc
将至少能和谐比对一次以上的paired-end reads写入. --al-conc-gz ... ,gzip压缩.
--al-conc-bz2 ... ,bz2压缩. --quiet 安静模式,除了比对错误和一些严重的错误, 不在屏幕上输出任何东西.
--met-file 将bowtie2的检测信息(metrics)写入文件. 用于debug. Default: metrics
disabled. --met-stderr 将bowtie2的检测信息(metrics)写入标准错误文件句柄. 和上
一个选项不冲突. Default: metrics disabled. --met 每隔秒写入一次metrics记录.
Default: 1.
--no-unal 不记录没比对上的reads. --no-hd 不记录SAM header lines (以@开头).
--no-sq 不记录@SQ的SAM header lines. --rg-id 设定read group Id到. --rg
增加作为一行@RG.
-o/--offrate 无视index的offrate值, 以取代之. Index默认的 值为5.
值必须大于index的offrate值, 同时越大, 耗时越长,耗内存越少. -p/--threads NTHREADS 设置线程数.
Default: 1 --reorder 多线程运算时, 比对结果在顺序上会和文件中reads的顺序不一致, 使用该选 项,
则使其一致. --mm 使用内存定位的I/O来载入index, 而不是常规的文件I/O. 从而使多个bowtie程
序共用内存中同样的index, 节约内存消耗.
--qc-filter 滤除QSEQ fileter filed为非0的reads. 仅当有—qseq选项时有效. Default:
off. --seed 使用作为随机数产生的种子. Default: 0. --version 打印程序版本并退出 -h/--help
打印用法信息并推出
更多详细信息请阅读:
http://bowtie-bio.sourceforge.net/bowtie2/manual.shtml
本文来自:http://www.hzaumycology.com/chenlianfu_blog/?p=178
【Bowtie】BOWTIE2:Manual(参数)
http://bowtie-bio.sourceforge.net/bowtie2/manual.shtml
bowtie2 [options]* -x {-1 -2 | -U } -S []
Main arguments
|
-x
|
The basename of the index for the reference genome. The basename is
the name of any of the index files up to but not including the
final.1.bt2 / .rev.1.bt2 /
etc. bowtie2 looks
for the specified index first in the current directory, then in the
directory specified in theBOWTIE2_INDEXES environment
variable.
|
|
-1
|
Comma-separated list of files containing mate 1s (filename usually
includes _1),
e.g. -1
flyA_1.fq,flyB_1.fq.
Sequences specified with this option must correspond file-for-file
and read-for-read with those specified in . Reads
may be a mix of different lengths. If- is
specified, bowtie2 will
read the mate 1s from the "standard in" or "stdin"
filehandle.
|
|
-2
|
Comma-separated list of files containing mate 2s (filename usually
includes _2),
e.g. -2
flyA_2.fq,flyB_2.fq.
Sequences specified with this option must correspond file-for-file
and read-for-read with those specified in . Reads
may be a mix of different lengths. If- is
specified, bowtie2 will
read the mate 2s from the "standard in" or "stdin"
filehandle.
|
|
-U
|
Comma-separated list of files containing unpaired reads to be
aligned, e.g. lane1.fq,lane2.fq,lane3.fq,lane4.fq.
Reads may be a mix of different lengths.
If - is
specified, bowtie2 gets
the reads from the "standard in" or "stdin" filehandle.
|
|
-S
|
File to write SAM alignments to. By default, alignments are written
to the "standard out" or "stdout" filehandle (i.e. the
console).
|
|
-q
|
Reads (specified
with , , )
are FASTQ files. FASTQ files usually have
extension .fq or .fastq.
FASTQ is the default format. See
also: --solexa-quals and --int-quals.
|
|
--qseq
|
Reads (specified
with , , )
are QSEQ files. QSEQ files usually end
in _qseq.txt. See
also: --solexa-quals and--int-quals.
|
|
-f
|
Reads (specified
with , , )
are FASTA files. FASTA files usually have
extension .fa, .fasta, .mfa, .fna or
similar. FASTA files do not have a way of specifying quality
values, so when -f is set, the
result is as if --ignore-quals is
also set.
|
|
-r
|
Reads (specified
with , , )
are files with one input sequence per line, without any other
information (no read names, no qualities).
When -r is set, the result is as
if --ignore-quals is also
set.
|
|
-c
|
The read sequences are given on command line.
I.e. , and are
comma-separated lists of reads rather than lists of read files.
There is no way to specify read names or qualities,
so -c also
implies --ignore-quals.
|
|
-s/--skip
|
Skip (i.e. do not align) the
first reads or pairs in the
input.
|
|
-u/--qupto
|
Align the first reads or read
pairs from the input (after
the -s/--skip reads or pairs have been skipped), then
stop. Default: no limit.
|
|
-5/--trim5
|
Trim bases from 5' (left) end of
each read before alignment (default: 0).
|
|
-3/--trim3
|
Trim bases from 3' (right) end
of each read before alignment (default: 0).
|
|
--phred33
|
Input qualities are ASCII chars equal to
the Phred
quality plus 33. This is also called the "Phred+33"
encoding, which is used by the very latest Illumina
pipelines.
|
|
--phred64
|
Input qualities are ASCII chars equal to
the Phred
quality plus 64. This is also called the "Phred+64"
encoding.
|
|
--solexa-quals
|
Convert input qualities from Solexa (which can be negative)
toPhred (which can't). This scheme was used in
older Illumina GA Pipeline versions (prior to 1.3). Default:
off.
|
|
--int-quals
|
Quality values are represented in the read input file as
space-separated ASCII integers,
e.g., 40
40 30 40...,
rather than ASCII characters,
e.g., II?I....
Integers are treated as being on
the Phred
quality scale
unless --solexa-quals is also specified. Default:
off.
|
|
--very-fast
|
Same as: -D
5 -R 1 -N 0 -L 22 -i S,0,2.50
|
|
--fast
|
Same as: -D
10 -R 2 -N 0 -L 22 -i S,0,2.50
|
|
--sensitive
|
Same as: -D
15 -R 2 -L 22 -i S,1,1.15 (default
in --end-to-end mode)
|
|
--very-sensitive
|
Same as: -D
20 -R 3 -N 0 -L 20 -i S,1,0.50
|
|
--very-fast-local
|
Same as: -D
5 -R 1 -N 0 -L 25 -i S,1,2.00
|
|
--fast-local
|
Same as: -D
10 -R 2 -N 0 -L 22 -i S,1,1.75
|
|
--sensitive-local
|
Same as: -D
15 -R 2 -N 0 -L 20 -i S,1,0.75(default
in --local mode)
|
|
--very-sensitive-local
|
Same as: -D
20 -R 3 -N 0 -L 20 -i S,1,0.50
|
|
-N
|
Sets the number of mismatches to allowed in a seed alignment
during multiseed
alignment. Can be set to 0
or 1. Setting this higher makes alignment slower (often much
slower) but increases sensitivity. Default: 0.
|
|
-L
|
Sets the length of the seed substrings to align
during multiseed
alignment. Smaller values
make alignment slower but more senstive. Default:
the --sensitive preset is used by default, which
sets -L to
20 both in --end-to-end mode and
in --local mode.
|
|
-i
|
Sets a function governing the interval between seed substrings to
use during multiseed
alignment. For instance,
if the read has 30 characers, and seed length is 10, and the seed
interval is 6, the seeds extracted will be:
Read: TAGCTACGCTCTACGCTATCATGCATAAAC Seed 1 fw: TAGCTACGCT Seed 1
rc: AGCGTAGCTA Seed 2 fw: CGCTCTACGC Seed 2 rc: GCGTAGAGCG Seed 3
fw: ACGCTATCAT Seed 3 rc: ATGATAGCGT Seed 4 fw: TCATGCATAA Seed 4
rc: TTATGCATGA
Since it's best to use longer intervals for longer reads, this
parameter sets the interval as a function of the read length,
rather than a single one-size-fits-all number. For instance,
specifying -i
S,1,2.5 sets
the interval function f to f(x)
= 1 + 2.5 * sqrt(x),
where x is the read length. See
also: setting
function options. If the
function returns a result less than 1, it is rounded up to 1.
Default: the --sensitive preset is used by default, which
sets -i to S,1,1.15 in --end-to-end mode
to -i
S,1,0.75 in --local mode.
|
|
--n-ceil
|
Sets a function governing the maximum number of ambiguous
characters (usually Ns
and/or .s)
allowed in a read as a function of read length. For instance,
specifying -L,0,0.15 sets
the N-ceiling function f to f(x)
= 0 + 0.15 * x,
where x is the read length. See
also: setting
function options. Reads
exceeding this ceiling are filtered
out.
Default: L,0,0.15.
|
|
--dpad
|
"Pads" dynamic programming problems
by columns on either side to
allow gaps. Default: 15.
|
|
--gbar
|
Disallow gaps within positions
of the beginning or end of the read. Default: 4.
|
|
--ignore-quals
|
When calculating a mismatch penalty, always consider the quality
value at the mismatched position to be the highest possible,
regardless of the actual value. I.e. input is treated as though all
quality values are high. This is also the default behavior when the
input doesn't specify quality values (e.g.
in -f, -r, or -c modes).
|
|
--nofw/--norc
|
If --nofw is
specified, bowtie2 will
not attempt to align unpaired reads to the forward (Watson)
reference strand. If --norc is
specified, bowtie2 will
not attempt to align unpaired reads against the reverse-complement
(Crick) reference strand. In paired-end
mode, --nofw and --norc pertain
to the fragments; i.e.
specifying --nofw causes bowtie2 to
explore only those paired-end configurations corresponding to
fragments from the reverse-complement (Crick) strand. Default: both
strands enabled.
|
|
--no-1mm-upfront
|
By default, Bowtie 2 will attempt to find either an exact or a
1-mismatch end-to-end alignment for the
read before trying
themultiseed
heuristic. Such alignments
can be found very quickly, and many short read alignments have
exact or near-exact end-to-end alignments. However, this can lead
to unexpected alignments when the user also sets options governing
themultiseed
heuristic,
like -L and -N. For instance, if the user
specifies -N
0 and -L equal
to the length of the read, the user will be surprised to find
1-mismatch alignments reported. This option prevents Bowtie 2 from
searching for 1-mismatch end-to-end alignments before using
the multiseed
heuristic, which leads to
the expected behavior when combined with options such
as -L and -N. This comes at the expense of speed.
|
|
--end-to-end
|
In this mode, Bowtie 2 requires that the entire read align from one
end to the other, without any trimming (or "soft clipping") of
characters from either end. The match
bonus --ma always equals 0 in this mode, so all
alignment scores are less than or equal to 0, and the greatest
possible alignment score is 0. This is mutually exclusive
with --local. --end-to-end is
the default mode.
|
|
--local
|
In this mode, Bowtie 2 does not require that the entire read align
from one end to the other. Rather, some characters may be omitted
("soft clipped") from the ends in order to achieve the greatest
possible alignment score. The match
bonus --ma is used in this mode, and the best
possible alignment score is equal to the match bonus
(--ma) times the length of the read.
Specifying --local and
one of the presets (e.g. --local
--very-fast)
is equivalent to specifying the local version of the preset
(--very-fast-local).
This is mutually exclusive
with --end-to-end. --end-to-end is
the default mode.
|
|
--ma
|
Sets the match bonus. In --local mode is
added to the alignment score for each position where a read
character aligns to a reference character and the characters match.
Not used in --end-to-end mode. Default: 2.
|
|
--mp MX,MN
|
Sets the maximum (MX)
and minimum (MN)
mismatch penalties, both integers. A number less than or equal
to MXand
greater than or equal to MN is
subtracted from the alignment score for each position where a read
character aligns to a reference character, the characters do not
match, and neither is an N.
If --ignore-quals is specified, the number subtracted
quals MX.
Otherwise, the number subtracted
is MN
+ floor( (MX-MN)(MIN(Q, 40.0)/40.0) ) where
Q is the Phred quality value.
Default: MX =
6, MN =
2.
|
|
--np
|
Sets penalty for positions where the read, reference, or both,
contain an ambiguous character such
as N.
Default: 1.
|
|
--rdg ,
|
Sets the read gap open () and extend () penalties. A read gap of
length N gets a penalty of + N
* . Default: 5, 3.
|
|
--rfg ,
|
Sets the reference gap open () and extend () penalties. A reference
gap of length N gets a penalty of + N
* . Default: 5, 3.
|
|
--score-min
|
Sets a function governing the minimum alignment score needed for an
alignment to be considered "valid" (i.e. good enough to report).
This is a function of read length. For instance,
specifying L,0,-0.6 sets
the minimum-score function f to f(x)
= 0 + -0.6 * x,
where x is
the read length. See also: setting
function options. The
default in --end-to-end mode
is L,-0.6,-0.6 and
the default in --local mode
is G,20,8.
|
|
-k
|
By default, bowtie2 searches
for distinct, valid alignments for each read. When it finds a valid
alignment, it continues looking for alignments that are nearly as
good or better. The best alignment found is reported (randomly
selected from among best if tied). Information about the best
alignments is used to estimate mapping quality and to set SAM
optional fields, such as AS:i and XS:i.
When -k is
specified, however, bowtie2 behaves
differently. Instead, it searches for at
most distinct, valid alignments
for each read. The search terminates when it can't find more
distinct valid alignments, or when it finds ,
whichever happens first. All alignments found are reported in
descending order by alignment score. The alignment score for a
paired-end alignment equals the sum of the alignment scores of the
individual mates. Each reported read or pair alignment beyond the
first has the SAM 'secondary' bit (which equals 256) set in its
FLAGS field. For reads that have more
than distinct, valid
alignments, bowtie2does
not gaurantee that
the alignments reported are the
best possible in terms of alignment
score. -k is
mutually exclusive with -a.
Note: Bowtie 2 is not designed with large values
for -k in
mind, and when aligning reads to long, repetitive genomes
large -k can
be very, very slow.
|
|
-a
|
Like -k but with no upper limit on number of
alignments to search for. -ais
mutually exclusive with -k.
Note: Bowtie 2 is not designed
with -a mode
in mind, and when aligning reads to long, repetitive genomes this
mode can be very, very slow.
|
|
-D
|
Up to consecutive seed extension
attempts can "fail" before Bowtie 2 moves on, using the alignments
found so far. A seed extension "fails" if it does not yield a new
best or a new second-best alignment. This limit is automatically
adjusted up when -k or -a are specified. Default: 15.
|
|
-R
|
is the maximum number of times Bowtie 2 will
"re-seed" reads with repetitive seeds. When "re-seeding," Bowtie 2
simply chooses a new set of reads (same length, same number of
mismatches allowed) at different offsets and searches for more
alignments. A read is considered to have repetitive seeds if the
total number of seed hits divided by the number of seeds that
aligned at least once is greater than 300. Default: 2.
|
|
-I/--minins
|
The minimum fragment length for valid paired-end alignments. E.g.
if -I
60 is
specified and a paired-end alignment consists of two 20-bp
alignments in the appropriate orientation with a 20-bp gap between
them, that alignment is considered valid (as long
as -X is also satisfied). A 19-bp gap would
not be valid in that case. If trimming
options -3 or -5 are also used,
the -I constraint is applied with respect to
the untrimmed mates.
The larger the difference
between -I and -X, the slower Bowtie 2 will run. This is because larger
differences bewteen -I and -X require that Bowtie 2 scan a larger
window to determine if a concordant alignment exists. For typical
fragment length ranges (200 to 400 nucleotides), Bowtie 2 is very
efficient.
Default: 0 (essentially imposing no minimum)
|
|
-X/--maxins
|
The maximum fragment length for valid paired-end alignments. E.g.
if -X
100 is
specified and a paired-end alignment consists of two 20-bp
alignments in the proper orientation with a 60-bp gap between them,
that alignment is considered valid (as long
as -I is also satisfied). A 61-bp gap would
not be valid in that case. If trimming
options -3 or -5are also used, the -X constraint
is applied with respect to the untrimmed mates, not the trimmed
mates.
The larger the difference
between -I and -X, the slower Bowtie 2 will run. This is because larger
differences bewteen -I and -X require that Bowtie 2 scan a larger
window to determine if a concordant alignment exists. For typical
fragment length ranges (200 to 400 nucleotides), Bowtie 2 is very
efficient.
Default: 500.
|
|
--fr/--rf/--ff
|
The upstream/downstream mate orientations for a valid paired-end
alignment against the forward reference strand. E.g.,
if --fr is
specified and there is a candidate paired-end alignment where mate
1 appears upstream of the reverse complement of mate 2 and the
fragment length constraints (-I and -X) are met, that alignment is valid. Also, if mate 2
appears upstream of the reverse complement of mate 1 and all other
constraints are met, that too is
valid. --rf likewise
requires that an upstream mate1 be reverse-complemented and a
downstream mate2 be
forward-oriented. --ff requires
both an upstream mate 1 and a downstream mate 2 to be
forward-oriented. Default: --fr (appropriate
for Illumina's Paired-end Sequencing Assay).
|
|
--no-mixed
|
By default, when bowtie2 cannot
find a concordant or discordant alignment for a pair, it then tries
to find alignments for the individual mates. This option disables
that behavior.
|
|
--no-discordant
|
By default, bowtie2 looks
for discordant alignments if it cannot find any concordant
alignments. A discordant alignment is an alignment where both mates
align uniquely, but that does not satisfy the paired-end
constraints (--fr/--rf/--ff,-I, -X). This option disables that behavior.
|
|
--dovetail
|
If the mates "dovetail", that is if one mate alignment extends past
the beginning of the other such that the wrong mate begins
upstream, consider that to be concordant. See
also:Mates
can overlap, contain or dovetail each other. Default: mates cannot dovetail in a concordant
alignment.
|
|
--no-contain
|
If one mate alignment contains the other, consider that to be
non-concordant. See also: Mates
can overlap, contain or dovetail each other. Default: a mate can contain the other in a
concordant alignment.
|
|
--no-overlap
|
If one mate alignment overlaps the other at all, consider that to
be non-concordant. See also: Mates
can overlap, contain or dovetail each other. Default: mates can overlap in a concordant
alignment.
|
|
-t/--time
|
Print the wall-clock time required to load the index files and
align the reads. This is printed to the "standard error" ("stderr")
filehandle. Default: off.
|
|
--un --un-gz --un-bz2
|
Write unpaired reads that fail to align to file
at . These reads correspond to the SAM records
with the FLAGS0x4 bit
set and neither the 0x40 nor 0x80 bits
set. If --un-gz is
specified, output will be gzip compressed.
If --un-bz2 is
specified, output will be bzip2 compressed. Reads written in this
way will appear exactly as they did in the input file, without any
modification (same sequence, same name, same quality string, same
quality encoding). Reads will not necessarily appear in the same
order as they did in the input.
|
|
--al --al-gz --al-bz2
|
Write unpaired reads that align at least once to file at. These
reads correspond to the SAM records with the
FLAGS 0x4, 0x40,
and 0x80 bits
unset. If --al-gz is
specified, output will be gzip compressed.
If --al-bz2 is
specified, output will be bzip2 compressed. Reads written in this
way will appear exactly as they did in the input file, without any
modification (same sequence, same name, same quality string, same
quality encoding). Reads will not necessarily appear in the same
order as they did in the input.
|
|
--un-conc --un-conc-gz --un-conc-bz2
|
Write paired-end reads that fail to align concordantly to file(s)
at . These reads correspond to the SAM records
with the FLAGS 0x4 bit
set and either the 0x40 or0x80 bit
set (depending on whether it's mate #1 or #2)..1 and .2 strings
are added to the filename to distinguish which file contains mate
#1 and mate #2. If a percent
symbol, %,
is used in , the percent symbol is replaced
with 1 or 2 to
make the per-mate filenames.
Otherwise, .1 or .2 are
added before the final dot in to make the per-mate
filenames. Reads written in this way will appear exactly as they
did in the input files, without any modification (same sequence,
same name, same quality string, same quality encoding). Reads will
not necessarily appear in the same order as they did in the
inputs.
|
|
--al-conc --al-conc-gz --al-conc-bz2
|
Write paired-end reads that align concordantly at least once to
file(s) at . These reads correspond to the SAM
records with the FLAGS 0x4 bit
unset and either the0x40 or 0x80 bit
set (depending on whether it's mate #1 or
#2). .1 and .2 strings
are added to the filename to distinguish which file contains mate
#1 and mate #2. If a percent
symbol, %,
is used in , the percent symbol is replaced
with 1 or 2 to
make the per-mate filenames.
Otherwise, .1 or .2 are
added before the final dot in to make the per-mate
filenames. Reads written in this way will appear exactly as they
did in the input files, without any modification (same sequence,
same name, same quality string, same quality encoding). Reads will
not necessarily appear in the same order as they did in the
inputs.
|
|
--quiet
|
Print nothing besides alignments and serious errors.
|
|
--met-file
|
Write bowtie2 metrics
to file . Having alignment metric can be useful
for debugging certain problems, especially performance issues. See
also: --met. Default: metrics disabled.
|
|
--met-stderr
|
Write bowtie2 metrics
to the "standard error" ("stderr") filehandle. This is not mutually
exclusive with --met-file. Having alignment metric can be useful for debugging
certain problems, especially performance issues. See
also:--met. Default: metrics disabled.
|
|
--met
|
Write a new bowtie2 metrics
record every seconds. Only
matters if either --met-stderr or --met-file are specified. Default: 1.
|
|
--no-unal
|
Suppress SAM records for reads that failed to align.
|
|
--no-hd
|
Suppress SAM header lines (starting
with @).
|
|
--no-sq
|
Suppress @SQ SAM
header lines.
|
|
--rg-id
|
Set the read group ID to . This causes the
SAM @RG header
line to be printed, with as the
value associated with theID: tag.
It also causes the RG:Z: extra
field to be attached to each SAM output record, with value set
to .
|
|
--rg
|
Add (usually of the
form TAG:VAL,
e.g. SM:Pool1)
as a field on the @RG header
line. Note: in order for the @RG line
to appear, --rg-id must also be specified. This is because
the IDtag
is required by the SAM
Spec.
Specify --rg multiple
times to set multiple fields. See
the SAM
Spec for
details about what fields are legal.
|
|
--omit-sec-seq
|
When printing secondary alignments, Bowtie 2 by default will write
out the SEQ and QUAL strings.
Specifying this option causes Bowtie 2 to print an asterix in those
fields instead.
|
|
-o/--offrate
|
Override the offrate of the index with .
If is greater than the offrate
used to build the index, then some row markings are discarded when
the index is read into memory. This reduces the memory footprint of
the aligner but requires more time to calculate text
offsets. must be greater than the value used to
build the index.
|
|
-p/--threads NTHREADS
|
Launch NTHREADS parallel
search threads (default: 1). Threads will run on separate
processors/cores and synchronize when parsing reads and outputting
alignments. Searching for alignments is highly parallel, and
speedup is close to linear.
Increasing -p increases
Bowtie 2's memory footprint. E.g. when aligning to a human genome
index, increasing -p from
1 to 8 increases the memory footprint by a few hundred megabytes.
This option is only available
if bowtie is
linked with thepthreads library
(i.e. if BOWTIE_PTHREADS=0 is
not specified at build time).
|
|
--reorder
|
Guarantees that output SAM records are printed in an order
corresponding to the order of the reads in the original input file,
even when -p is set greater than 1.
Specifying --reorder and
setting -p greater than 1 causes Bowtie 2 to run
somewhat slower and use somewhat more memory then
if --reorder were
not specified. Has no effect
if -p is set to 1, since output order will
naturally correspond to input order in that case.
|
|
--mm
|
Use memory-mapped I/O to load the index, rather than typical file
I/O. Memory-mapping allows many concurrentbowtie processes
on the same computer to share the same memory image of the index
(i.e. you pay the memory overhead just once). This facilitates
memory-efficient parallelization
of bowtie in
situations where using -p is not possible or not
preferable.
|
|
--qc-filter
|
Filter out reads for which the QSEQ filter field is non-zero. Only
has an effect when read format
is --qseq. Default: off.
|
|
--seed
|
Use as the seed for
pseudo-random number generator. Default: 0.
|
|
--non-deterministic
|
Normally, Bowtie 2 re-initializes its pseudo-random generator for
each read. It seeds the generator with a number derived from (a)
the read name, (b) the nucleotide sequence, (c) the quality
sequence, (d) the value of the --seed option. This means that if two reads are
identical (same name, same nucleotides, same qualities) Bowtie 2
will find and report the same alignment(s) for both, even if there
was ambiguity. When --non-deterministic is
specified, Bowtie 2 re-initializes its pseudo-random generator for
each read using the current time. This means that Bowtie 2 will not
necessarily report the same alignment for two identical reads. This
is counter-intuitive for some users, but might be more appropriate
in situations where the input consists of many identical
reads.
|
|
--version
|
Print version information and quit.
|
|
-h/--help
|
Print usage information and quit.
|
加载中,请稍候......