加载中…
个人资料
  • 博客等级:
  • 博客积分:
  • 博客访问:
  • 关注人气:
  • 获赠金笔:0支
  • 赠出金笔:0支
  • 荣誉徽章:
正文 字体大小:

Bowtie 比对

(2017-02-24 16:47:43)

Bowtie】DNA序列拼接的原理

Jenny点评】 我一直以为Bowtie是一个短序列拼接工作,实际上这是错误的。它不是序列拼接工作,只是一个序列比对的工具。最后的结果是相对index而言,对各个短序列进行定位。

------------------

短序列比对的原理如何?目前有哪些常用的短序列比对软件? ok

http://blog.sina.com.cn/s/blog_9617895f01011npk.html

 答:序列比对(alignment):为确定两个或多个序列之间的相似性以至于同源性,而将它们按照一定的规律排列。跟长序列比对不同,短序列比对有其特点,因此,两者的算法不一样。短序列比对中,一般常用的算法主要有三个:

       (1) 空位种子片段索引法,如MAQ、ELAND等,首先将读段切分,并选取其中一段或几段作为种子建立搜索索引,再通过查找索引、延展匹配来实现读段定位,通过轮换种子考虑允许出现错配(mismatch)的各种可能的位置组合;

       (2) Burrows Wheeler转换法,如Bowtie、BWA、SOAP2等,通过B-W转换将基因组序列按一定规则压缩并建立索引,再通过查找和回溯来定位读段,在查找时可通过碱基替代来实现允许的错配;

       (3) Smith-Waterman动态规划算法,如BFAST,SHRiMP等,利用初始条件和迭代关系式计算两个序列的所有可能的比对分值,并将结果存放于一个矩阵中,利用动态规划的方法回溯寻找最优的比对结果。

 

华大基因拼接 ok

http://www.ebiotrade.com/newsf/2010-1/2010128171022809.htm

 

下一代基因序列拼接算法研究

http://www.fdurop.fudan.edu.cn/upload/stu/docs/rcYsXb_102804-1303180458.pdf

基因组测序及分析 Good!推荐看!

http://ibi.zju.edu.cn/bioinplant/courses/chap4.pdf 

基因序列拼接算法设计

http://www.doc88.com/p-741680604744.html

 

 

Bowtie】Bowtie2使用方法与参数详细介绍

Bowtie2使用方法与参数详细介绍

 

懒人必看

Bowtie2 -q --phred33 --sensitive --end-to-end -I 0 -X 500 --fr --un unpaired --al aligned \ --un-conc unconc --al-conc alconc -p 6 --reorder -x{-1-2| -U} -S []

 

用法:

bowtie2 [options]* -x {-1 -2 | -U } -S []

 

必须参数

-x 由bowtie2-build所生成的索引文件的前缀。首先 在当前目录搜寻,然后 在环境变量 BOWTIE2_INDEXES 中制定的文件夹中搜寻。 -1 双末端测寻对应的文件1。可以为多个文件,并用逗号分开;多个文件必须和 -2 中制定的文件一一对应。比如:"-1 flyA_1.fq,flyB_1.fq -2 flyA_2.fq,flyB _2.fq". 测序文件中的reads的长度可以不一样。 -2 双末端测寻对应的文件2. -U 非双末端测寻对应的文件。可以为多个文件,并用逗号分开。测序文件中的reads的 长度可以不一样。 -S 所生成的SAM格式的文件前缀。默认是输入到标准输出。

 

以下是可选参数

输入参数

-q 输入的文件为FASTQ格式文件,此项为默认值。 -qseq 输入的文件为QSEQ格式文件。 -f 输入的文件为FASTA格式文件。选择此项时,表示--ignore-quals也被选择了。 -r 输入的文件中,每一行代表一条序列,没有序列名和测序质量等。选择此项时,表示-- ignore-quals也被选择了。 -c 后直接为比对的reads序列,而不是包含序列的文件名。序列间用逗号隔开。选择此项时, 表示—ignore-quals也被选择了。 -s/--skip input的reads中,跳过前个reads或者pairs。 -u/--qupto 只比对前个reads或者pairs(在跳过前个reads或者 pairs后)。Default: no limit. -5/--trim5 剪掉5'端长度的碱基,再用于比对。(default: 0). -3/--trim3 剪掉3'端长度的碱基,再用于比对。(default: 0). --phred33 输入的碱基质量等于ASCII码值加上33. 在最近的illumina pipiline中 得以运用。 --phred64 输入的碱基质量等于ASCII码值加上64. --solexa-quals 将Solexa的碱基质量转换为Phred。在老的GA Pipeline版本中得以 运用。Default: off. --int-quals 输入文件中的碱基质量为用“ ”分隔的数值,而不是ASCII码。比如 40 40 30 40...。Default: off.

 

–end-to-end模式下的预设参数

--very-fast Same as: -D 5 -R 1 -N 0 -L 22 -i S,0,2.50 --fast Same as: -D 10 -R 2 -N 0 -L 22 -i S,0,2.50 --sensitive Same as: -D 15 -R 2 -N 0 -L 22 -i S,1,1.15 (default in --end-to-end mode) --very-sensitive Same as: -D 20 -R 3 -N 0 -L 20 -i S,1,0.50

 

–loca模式下的预设参数

–loca模式下的预设参数 --very-fast-local Same as: -D 5 -R 1 -N 0 -L 25 -i S,1,2.00 --fast-local Same as: -D 10 -R 2 -N 0 -L 22 -i S,1,1.75 --sensitive-local Same as: -D 15 -R 2 -N 0 -L 20 -i S,1,0.75 (default in --local mode) --very-sensitive-local Same as: -D 20 -R 3 -N 0 -L 20 -i S,1,0.50

 

比对参数

-N 进行种子比对时允许的mismatch数. 可以设为0或者1. Default: 0. -L 设定种子的长度. ************************************************************ 功能选项 给bowtie的一些参数设定值的时候,使用一个计算公式代替,于是值的大小与比对序列的长 度成一定关系。有三部分组成: (a)计算方法, 包括常数(C),线性(L),平方根(S)和 自然对数(G); (b)一个常数; (c)一个系数. 例如: 为 L,-0.4,-0.6 则计算公式为: f(x) = -0.4 + -0.6 * x 为G,1,5.4 则计算公式为: f(x) = 1.0 + 5.4 * ln(x) ************************************************************ -i 设定两个相邻种子间所间距的碱基数。 ************************************************************ 例如:如果read的长度为30, 种子的长度为10, 相邻种子的间距为6,则提取出的种子如下 所示: Read: TAGCTACGCTCTACGCTATCATGCATAAAC Seed 1 fw: TAGCTACGCT Seed 1 rc: AGCGTAGCTA Seed 2 fw: CGCTCTACGC Seed 2 rc: GCGTAGAGCG Seed 3 fw: ACGCTATCAT Seed 3 rc: ATGATAGCGT Seed 4 fw: TCATGCATAA Seed 4 rc: TTATGCATGA ************************************************************ 在--end-to-end模式中默认值为”-i S,1,1.15”.即表示f(x) = 1 + 1.15 * sqrt(x). 如果read长度为100, 则相邻种子的间距为12. --n-ceil 设定read中允许含有不确定碱基(非GTAC,通常为N)的最大数目. Default: L,0,0.15. 计算公式为: f(x) = 0 + 0.15 * x, 表示长度为100的read 最多运行存在15个不确定碱基. 一旦不确定碱基数超过15, 则该条read会被过滤掉. --dpad Default: 15. --gbar 在read头尾个碱基内不允许gap. Default: 4. --ignore-quals 计算错配罚分的时候不考虑碱基质量. 当输入序列的模式为-f, -r 或 者-c的时候, 该设置自动成为默认设置. --nofw/--norc –nofw设定read不和前导链(forward reference strand)进行比对; --norc设定不和后随链(reverse-complement reference strand)进行比对. Default: both strands enabled. --end-to-end 比对是将整个read和参考序列进行比对. 该模式--ma的值为0. 该模式为 默认模式, --local模式冲突. --local 该模式下对read进行局部比对, 从而, read两端的一些碱基不比对,从而使比 对得分满足要求. 该模式下 –ma默认为2.

 

得分罚分参数

--ma 设定匹配得分. --local模式下每个read上碱基和参考序列上碱基匹配, 则 加分. 在—end-to-end模式中无效. Default: 2. --mp MX,MN 设定错配罚分. 其中MX为所罚最高分, MN为所罚最低分. 默认设置下罚分与 碱基质量相关. 罚分遵循的公式为: MN + floor( (MX-MN)(MIN(Q, 40.0)/40.0) ). 其中Q为碱基的质量值. 如果设置了—ignore-qual参数, 则错配总是罚最高分. Default: MX = 6, MN = 2. --np 当匹配位点中read, reference上有不确定碱基(比如N)时所设定的罚分值. Default: 1. --rdg , 设置在read上打开gap 罚分, 延长gap罚分. Default: 5, 3. --rfg , 设置在reference上打开gap 罚分, 延长gap罚分 . Default: 5, 3. --score-min 设定成为有效比对的最小分值. 在—end-to-end模式下默认值为: L,-0.6,-0.6; 在--local模式下默认值为: G,20,8.

 

报告参数

-k 默认设置下, bowtie2搜索出了一个read不同的比对结果, 并报告其中最好的 比对结果(如果好几个最好的比对结果得分一致, 则随机挑选出其中一个). 而在该模式下, bowtie2最多搜索出一个read 个比对结果, 并将这些结果按得分降序报告出来. -a 和-k参数一样, 不过不限制搜索的结果数目. 并将所有的比对结果都按降序报告出来. 此参数和-k参数冲突. 值得注意的是: 如果基因组含有很多重复序列时, 该参数会导致程序 运行极其缓慢.

 

Effort 参数

-D 比对时, 将一个种子延长后得到比对结果, 如果不产生更好的或次好的比对结果, 则该次比对失败. 当失败次数连续达到次后, 则该条read比对结束. Bowtie2才会 继续进行下去. Default: 15. 当具有-k或-a参数, 则该参数所产生的限制会自动调整. -R 如果一个read所生成的种子在参考序列上匹配位点过多. 当每个种子平均匹配超 过300个位置, 则通过一个不同的偏移来重新生成种子进行比对. 则是重新生成种子 的次数. Default: 2.

 

Paired-end 参数

-I/--minins 设定最小的插入片段长度. Default: 0. -X/--maxins 设定最长的插入片段长度. Default: 500. --fr/--rf/--ff 设定上下游reads和前导链paired-end比对的方向. --fr: 匹配时, read1在5'端上游, 和前导链一致, read2在3'下游, 和前导链反向互补. 或者read2在 上游, read1在下游反向互补; --rf: read1在5'端上游, 和前导链反向互补, read2在 3'端下游, 和前导链一致; --fr: 两条reads都和前导链一致. Default: --fr. 默认 设置适合于Illumina的paired-end测序数据; 若是mate-paired, 则要选择—rf参数. --no-mixed 默认设置下, 一对reads不能成对比对到参考序列上, 则单独对每个read进 行比对. 该选项则阻止此行为. --no-discordant 默认设置下, 一对reads不能和谐比对(concordant alignment, 即满足-I, -X, --fr/--rf/--ff的条件)到参考序列上, 则搜寻其不和谐比对(discon cordant alignment, 即两条reads都能独一无二地比对到参考序列上, 但是不满足-I, -X,--fr/--rf/--ff的条件). 该选项阻止此行为. --dovetail read1和read2的关系为dovetail的时候,该状况算为和谐比对. 默认情况 下dovetail不算和谐比对. --no-contain read1和read2的关系为包含的时候, 该状况不算为和谐比对. 默认情况 下包含关系算为和谐比对. --no-overlap read1和read2的关系为有重叠的时候, 该状况不算为和谐比对. 默认情 况下两个reads重叠算为和谐比对.

 

输出参数

-t/--time --un 将unpaired reads写入到. --un-gz 将unpaired reads写入到, gzip压缩. --un-bz2 将unpaired reads写入到, bz2压缩. --al 将至少能比对1次以上的unpaired reads写入. --al-gz ... ,gzip压缩. --al-bz2 ... ,bz2压缩. --un-conc 将不能和谐比对的paired-end reads写入. --un-conc-gz ... ,gzip压缩. --un-conc-bz2 ... ,bz2压缩. --al-conc 将至少能和谐比对一次以上的paired-end reads写入. --al-conc-gz ... ,gzip压缩. --al-conc-bz2 ... ,bz2压缩. --quiet 安静模式,除了比对错误和一些严重的错误, 不在屏幕上输出任何东西. --met-file 将bowtie2的检测信息(metrics)写入文件. 用于debug. Default: metrics disabled. --met-stderr 将bowtie2的检测信息(metrics)写入标准错误文件句柄. 和上 一个选项不冲突. Default: metrics disabled. --met 每隔秒写入一次metrics记录. Default: 1.

 

Sam 参数

--no-unal 不记录没比对上的reads. --no-hd 不记录SAM header lines (以@开头). --no-sq 不记录@SQ的SAM header lines. --rg-id 设定read group Id到. --rg 增加作为一行@RG.

性能参数

-o/--offrate 无视index的offrate值, 以取代之. Index默认的 值为5. 值必须大于index的offrate值, 同时越大, 耗时越长,耗内存越少. -p/--threads NTHREADS 设置线程数. Default: 1 --reorder 多线程运算时, 比对结果在顺序上会和文件中reads的顺序不一致, 使用该选 项, 则使其一致. --mm 使用内存定位的I/O来载入index, 而不是常规的文件I/O. 从而使多个bowtie程 序共用内存中同样的index, 节约内存消耗.

 

其它参数

--qc-filter 滤除QSEQ fileter filed为非0的reads. 仅当有—qseq选项时有效. Default: off. --seed 使用作为随机数产生的种子. Default: 0. --version 打印程序版本并退出 -h/--help 打印用法信息并推出

 

更多详细信息请阅读:

http://bowtie-bio.sourceforge.net/bowtie2/manual.shtml

本文来自:http://www.hzaumycology.com/chenlianfu_blog/?p=178

 

Bowtie】BOWTIE2:Manual(参数)

http://bowtie-bio.sourceforge.net/bowtie2/manual.shtml

 

Usage

bowtie2 [options]* -x {-1 -2 | -U } -S []

Main arguments

-x

The basename of the index for the reference genome. The basename is the name of any of the index files up to but not including the final.1.bt2 .rev.1.bt2 / etc. bowtie2 looks for the specified index first in the current directory, then in the directory specified in theBOWTIE2_INDEXES environment variable.

-1

Comma-separated list of files containing mate 1s (filename usually includes _1), e.g. -1 flyA_1.fq,flyB_1.fq. Sequences specified with this option must correspond file-for-file and read-for-read with those specified in . Reads may be a mix of different lengths. If- is specified, bowtie2 will read the mate 1s from the "standard in" or "stdin" filehandle.

-2

Comma-separated list of files containing mate 2s (filename usually includes _2), e.g. -2 flyA_2.fq,flyB_2.fq. Sequences specified with this option must correspond file-for-file and read-for-read with those specified in . Reads may be a mix of different lengths. If- is specified, bowtie2 will read the mate 2s from the "standard in" or "stdin" filehandle.

-U

Comma-separated list of files containing unpaired reads to be aligned, e.g. lane1.fq,lane2.fq,lane3.fq,lane4.fq. Reads may be a mix of different lengths. If - is specified, bowtie2 gets the reads from the "standard in" or "stdin" filehandle.

-S

File to write SAM alignments to. By default, alignments are written to the "standard out" or "stdout" filehandle (i.e. the console).

 

Options

Input options

-q

Reads (specified with ) are FASTQ files. FASTQ files usually have extension .fq or .fastq. FASTQ is the default format. See also: --solexa-quals and --int-quals.

--qseq

Reads (specified with ) are QSEQ files. QSEQ files usually end in _qseq.txt. See also: --solexa-quals and--int-quals.

-f

Reads (specified with ) are FASTA files. FASTA files usually have extension .fa, .fasta, .mfa, .fna or similar. FASTA files do not have a way of specifying quality values, so when -f is set, the result is as if --ignore-quals is also set.

-r

Reads (specified with ) are files with one input sequence per line, without any other information (no read names, no qualities). When -r is set, the result is as if --ignore-quals is also set.

-c

The read sequences are given on command line. I.e. and  are comma-separated lists of reads rather than lists of read files. There is no way to specify read names or qualities, so -c also implies --ignore-quals.

-s/--skip

Skip (i.e. do not align) the first  reads or pairs in the input.

-u/--qupto

Align the first  reads or read pairs from the input (after the -s/--skip reads or pairs have been skipped), then stop. Default: no limit.

-5/--trim5

Trim  bases from 5' (left) end of each read before alignment (default: 0).

-3/--trim3

Trim  bases from 3' (right) end of each read before alignment (default: 0).

--phred33

Input qualities are ASCII chars equal to the Phred quality plus 33. This is also called the "Phred+33" encoding, which is used by the very latest Illumina pipelines.

--phred64

Input qualities are ASCII chars equal to the Phred quality plus 64. This is also called the "Phred+64" encoding.

--solexa-quals

Convert input qualities from Solexa (which can be negative) toPhred (which can't). This scheme was used in older Illumina GA Pipeline versions (prior to 1.3). Default: off.

--int-quals

Quality values are represented in the read input file as space-separated ASCII integers, e.g., 40 40 30 40..., rather than ASCII characters, e.g., II?I.... Integers are treated as being on the Phred quality scale unless --solexa-quals is also specified. Default: off.

 

Preset options in --end-to-end mode

--very-fast

Same as: -D 5 -R 1 -N 0 -L 22 -i S,0,2.50

--fast

Same as: -D 10 -R 2 -N 0 -L 22 -i S,0,2.50

--sensitive

Same as: -D 15 -R 2 -L 22 -i S,1,1.15 (default in --end-to-end mode)

--very-sensitive

Same as: -D 20 -R 3 -N 0 -L 20 -i S,1,0.50

 

Preset options in --local mode

--very-fast-local

Same as: -D 5 -R 1 -N 0 -L 25 -i S,1,2.00

--fast-local

Same as: -D 10 -R 2 -N 0 -L 22 -i S,1,1.75

--sensitive-local

Same as: -D 15 -R 2 -N 0 -L 20 -i S,1,0.75(default in --local mode)

--very-sensitive-local

Same as: -D 20 -R 3 -N 0 -L 20 -i S,1,0.50

 

Alignment options

-N

Sets the number of mismatches to allowed in a seed alignment during multiseed alignment. Can be set to 0 or 1. Setting this higher makes alignment slower (often much slower) but increases sensitivity. Default: 0.

-L

Sets the length of the seed substrings to align during multiseed alignment. Smaller values make alignment slower but more senstive. Default: the --sensitive preset is used by default, which sets -L to 20 both in --end-to-end mode and in --local mode.

-i

Sets a function governing the interval between seed substrings to use during multiseed alignment. For instance, if the read has 30 characers, and seed length is 10, and the seed interval is 6, the seeds extracted will be:

Read: TAGCTACGCTCTACGCTATCATGCATAAAC Seed 1 fw: TAGCTACGCT Seed 1 rc: AGCGTAGCTA Seed 2 fw: CGCTCTACGC Seed 2 rc: GCGTAGAGCG Seed 3 fw: ACGCTATCAT Seed 3 rc: ATGATAGCGT Seed 4 fw: TCATGCATAA Seed 4 rc: TTATGCATGA

Since it's best to use longer intervals for longer reads, this parameter sets the interval as a function of the read length, rather than a single one-size-fits-all number. For instance, specifying -i S,1,2.5 sets the interval function f to f(x) = 1 + 2.5 * sqrt(x), where x is the read length. See also: setting function options. If the function returns a result less than 1, it is rounded up to 1. Default: the --sensitive preset is used by default, which sets -i to S,1,1.15 in --end-to-end mode to -i S,1,0.75 in --local mode.

--n-ceil

Sets a function governing the maximum number of ambiguous characters (usually Ns and/or .s) allowed in a read as a function of read length. For instance, specifying -L,0,0.15 sets the N-ceiling function f to f(x) = 0 + 0.15 * x, where x is the read length. See also: setting function options. Reads exceeding this ceiling are filtered out. Default: L,0,0.15.

--dpad

"Pads" dynamic programming problems by  columns on either side to allow gaps. Default: 15.

--gbar

Disallow gaps within  positions of the beginning or end of the read. Default: 4.

--ignore-quals

When calculating a mismatch penalty, always consider the quality value at the mismatched position to be the highest possible, regardless of the actual value. I.e. input is treated as though all quality values are high. This is also the default behavior when the input doesn't specify quality values (e.g. in -f-r, or -c modes).

--nofw/--norc

If --nofw is specified, bowtie2 will not attempt to align unpaired reads to the forward (Watson) reference strand. If --norc is specified, bowtie2 will not attempt to align unpaired reads against the reverse-complement (Crick) reference strand. In paired-end mode, --nofw and --norc pertain to the fragments; i.e. specifying --nofw causes bowtie2 to explore only those paired-end configurations corresponding to fragments from the reverse-complement (Crick) strand. Default: both strands enabled.

--no-1mm-upfront

By default, Bowtie 2 will attempt to find either an exact or a 1-mismatch end-to-end alignment for the read before trying themultiseed heuristic. Such alignments can be found very quickly, and many short read alignments have exact or near-exact end-to-end alignments. However, this can lead to unexpected alignments when the user also sets options governing themultiseed heuristic, like -L and -N. For instance, if the user specifies -N 0 and -L equal to the length of the read, the user will be surprised to find 1-mismatch alignments reported. This option prevents Bowtie 2 from searching for 1-mismatch end-to-end alignments before using the multiseed heuristic, which leads to the expected behavior when combined with options such as -L and -N. This comes at the expense of speed.

--end-to-end

In this mode, Bowtie 2 requires that the entire read align from one end to the other, without any trimming (or "soft clipping") of characters from either end. The match bonus --ma always equals 0 in this mode, so all alignment scores are less than or equal to 0, and the greatest possible alignment score is 0. This is mutually exclusive with --local--end-to-end is the default mode.

--local

In this mode, Bowtie 2 does not require that the entire read align from one end to the other. Rather, some characters may be omitted ("soft clipped") from the ends in order to achieve the greatest possible alignment score. The match bonus --ma is used in this mode, and the best possible alignment score is equal to the match bonus (--ma) times the length of the read. Specifying --local and one of the presets (e.g. --local --very-fast) is equivalent to specifying the local version of the preset (--very-fast-local). This is mutually exclusive with --end-to-end--end-to-end is the default mode.

 

Scoring options

--ma

Sets the match bonus. In --local mode  is added to the alignment score for each position where a read character aligns to a reference character and the characters match. Not used in --end-to-end mode. Default: 2.

--mp MX,MN

Sets the maximum (MX) and minimum (MN) mismatch penalties, both integers. A number less than or equal to MXand greater than or equal to MN is subtracted from the alignment score for each position where a read character aligns to a reference character, the characters do not match, and neither is an N. If --ignore-quals is specified, the number subtracted quals MX. Otherwise, the number subtracted is MN + floor( (MX-MN)(MIN(Q, 40.0)/40.0) ) where Q is the Phred quality value. Default: MX = 6, MN = 2.

--np

Sets penalty for positions where the read, reference, or both, contain an ambiguous character such as N. Default: 1.

--rdg ,

Sets the read gap open () and extend () penalties. A read gap of length N gets a penalty of + N * . Default: 5, 3.

--rfg ,

Sets the reference gap open () and extend () penalties. A reference gap of length N gets a penalty of + N * . Default: 5, 3.

--score-min

Sets a function governing the minimum alignment score needed for an alignment to be considered "valid" (i.e. good enough to report). This is a function of read length. For instance, specifying L,0,-0.6 sets the minimum-score function f to f(x) = 0 + -0.6 * x, where x is the read length. See also: setting function options. The default in --end-to-end mode is L,-0.6,-0.6 and the default in --local mode is G,20,8.

 

Reporting options

-k

By default, bowtie2 searches for distinct, valid alignments for each read. When it finds a valid alignment, it continues looking for alignments that are nearly as good or better. The best alignment found is reported (randomly selected from among best if tied). Information about the best alignments is used to estimate mapping quality and to set SAM optional fields, such as AS:i and XS:i.

When -k is specified, however, bowtie2 behaves differently. Instead, it searches for at most  distinct, valid alignments for each read. The search terminates when it can't find more distinct valid alignments, or when it finds , whichever happens first. All alignments found are reported in descending order by alignment score. The alignment score for a paired-end alignment equals the sum of the alignment scores of the individual mates. Each reported read or pair alignment beyond the first has the SAM 'secondary' bit (which equals 256) set in its FLAGS field. For reads that have more than  distinct, valid alignments, bowtie2does not gaurantee that the  alignments reported are the best possible in terms of alignment score. -k is mutually exclusive with -a.

Note: Bowtie 2 is not designed with large values for -k in mind, and when aligning reads to long, repetitive genomes large -k can be very, very slow.

-a

Like -k but with no upper limit on number of alignments to search for. -ais mutually exclusive with -k.

Note: Bowtie 2 is not designed with -a mode in mind, and when aligning reads to long, repetitive genomes this mode can be very, very slow.

 

Effort options

-D

Up to  consecutive seed extension attempts can "fail" before Bowtie 2 moves on, using the alignments found so far. A seed extension "fails" if it does not yield a new best or a new second-best alignment. This limit is automatically adjusted up when -k or -a are specified. Default: 15.

-R

 is the maximum number of times Bowtie 2 will "re-seed" reads with repetitive seeds. When "re-seeding," Bowtie 2 simply chooses a new set of reads (same length, same number of mismatches allowed) at different offsets and searches for more alignments. A read is considered to have repetitive seeds if the total number of seed hits divided by the number of seeds that aligned at least once is greater than 300. Default: 2.

 

Paired-end options

-I/--minins

The minimum fragment length for valid paired-end alignments. E.g. if -I 60 is specified and a paired-end alignment consists of two 20-bp alignments in the appropriate orientation with a 20-bp gap between them, that alignment is considered valid (as long as -X is also satisfied). A 19-bp gap would not be valid in that case. If trimming options -3 or -5 are also used, the -I constraint is applied with respect to the untrimmed mates.

The larger the difference between -I and -X, the slower Bowtie 2 will run. This is because larger differences bewteen -I and -X require that Bowtie 2 scan a larger window to determine if a concordant alignment exists. For typical fragment length ranges (200 to 400 nucleotides), Bowtie 2 is very efficient.

Default: 0 (essentially imposing no minimum)

-X/--maxins

The maximum fragment length for valid paired-end alignments. E.g. if -X 100 is specified and a paired-end alignment consists of two 20-bp alignments in the proper orientation with a 60-bp gap between them, that alignment is considered valid (as long as -I is also satisfied). A 61-bp gap would not be valid in that case. If trimming options -3 or -5are also used, the -X constraint is applied with respect to the untrimmed mates, not the trimmed mates.

The larger the difference between -I and -X, the slower Bowtie 2 will run. This is because larger differences bewteen -I and -X require that Bowtie 2 scan a larger window to determine if a concordant alignment exists. For typical fragment length ranges (200 to 400 nucleotides), Bowtie 2 is very efficient.

Default: 500.

--fr/--rf/--ff

The upstream/downstream mate orientations for a valid paired-end alignment against the forward reference strand. E.g., if --fr is specified and there is a candidate paired-end alignment where mate 1 appears upstream of the reverse complement of mate 2 and the fragment length constraints (-I and -X) are met, that alignment is valid. Also, if mate 2 appears upstream of the reverse complement of mate 1 and all other constraints are met, that too is valid. --rf likewise requires that an upstream mate1 be reverse-complemented and a downstream mate2 be forward-oriented. --ff requires both an upstream mate 1 and a downstream mate 2 to be forward-oriented. Default: --fr (appropriate for Illumina's Paired-end Sequencing Assay).

--no-mixed

By default, when bowtie2 cannot find a concordant or discordant alignment for a pair, it then tries to find alignments for the individual mates. This option disables that behavior.

--no-discordant

By default, bowtie2 looks for discordant alignments if it cannot find any concordant alignments. A discordant alignment is an alignment where both mates align uniquely, but that does not satisfy the paired-end constraints (--fr/--rf/--ff,-I-X). This option disables that behavior.

--dovetail

If the mates "dovetail", that is if one mate alignment extends past the beginning of the other such that the wrong mate begins upstream, consider that to be concordant. See also:Mates can overlap, contain or dovetail each other. Default: mates cannot dovetail in a concordant alignment.

--no-contain

If one mate alignment contains the other, consider that to be non-concordant. See also: Mates can overlap, contain or dovetail each other. Default: a mate can contain the other in a concordant alignment.

--no-overlap

If one mate alignment overlaps the other at all, consider that to be non-concordant. See also: Mates can overlap, contain or dovetail each other. Default: mates can overlap in a concordant alignment.

 

Output options

-t/--time

Print the wall-clock time required to load the index files and align the reads. This is printed to the "standard error" ("stderr") filehandle. Default: off.

--un --un-gz --un-bz2

Write unpaired reads that fail to align to file at . These reads correspond to the SAM records with the FLAGS0x4 bit set and neither the 0x40 nor 0x80 bits set. If --un-gz is specified, output will be gzip compressed. If --un-bz2 is specified, output will be bzip2 compressed. Reads written in this way will appear exactly as they did in the input file, without any modification (same sequence, same name, same quality string, same quality encoding). Reads will not necessarily appear in the same order as they did in the input.

--al --al-gz --al-bz2

Write unpaired reads that align at least once to file at. These reads correspond to the SAM records with the FLAGS 0x40x40, and 0x80 bits unset. If --al-gz is specified, output will be gzip compressed. If --al-bz2 is specified, output will be bzip2 compressed. Reads written in this way will appear exactly as they did in the input file, without any modification (same sequence, same name, same quality string, same quality encoding). Reads will not necessarily appear in the same order as they did in the input.

--un-conc --un-conc-gz --un-conc-bz2

Write paired-end reads that fail to align concordantly to file(s) at . These reads correspond to the SAM records with the FLAGS 0x4 bit set and either the 0x40 or0x80 bit set (depending on whether it's mate #1 or #2)..1 and .2 strings are added to the filename to distinguish which file contains mate #1 and mate #2. If a percent symbol, %, is used in , the percent symbol is replaced with 1 or 2 to make the per-mate filenames. Otherwise, .1 or .2 are added before the final dot in to make the per-mate filenames. Reads written in this way will appear exactly as they did in the input files, without any modification (same sequence, same name, same quality string, same quality encoding). Reads will not necessarily appear in the same order as they did in the inputs.

--al-conc --al-conc-gz --al-conc-bz2

Write paired-end reads that align concordantly at least once to file(s) at . These reads correspond to the SAM records with the FLAGS 0x4 bit unset and either the0x40 or 0x80 bit set (depending on whether it's mate #1 or #2). .1 and .2 strings are added to the filename to distinguish which file contains mate #1 and mate #2. If a percent symbol, %, is used in , the percent symbol is replaced with 1 or 2 to make the per-mate filenames. Otherwise, .1 or .2 are added before the final dot in to make the per-mate filenames. Reads written in this way will appear exactly as they did in the input files, without any modification (same sequence, same name, same quality string, same quality encoding). Reads will not necessarily appear in the same order as they did in the inputs.

--quiet

Print nothing besides alignments and serious errors.

--met-file

Write bowtie2 metrics to file . Having alignment metric can be useful for debugging certain problems, especially performance issues. See also: --met. Default: metrics disabled.

--met-stderr

Write bowtie2 metrics to the "standard error" ("stderr") filehandle. This is not mutually exclusive with --met-file. Having alignment metric can be useful for debugging certain problems, especially performance issues. See also:--met. Default: metrics disabled.

--met

Write a new bowtie2 metrics record every  seconds. Only matters if either --met-stderr or --met-file are specified. Default: 1.

 

SAM options

--no-unal

Suppress SAM records for reads that failed to align.

--no-hd

Suppress SAM header lines (starting with @).

--no-sq

Suppress @SQ SAM header lines.

--rg-id

Set the read group ID to . This causes the SAM @RG header line to be printed, with  as the value associated with theID: tag. It also causes the RG:Z: extra field to be attached to each SAM output record, with value set to .

--rg

Add  (usually of the form TAG:VAL, e.g. SM:Pool1) as a field on the @RG header line. Note: in order for the @RG line to appear, --rg-id must also be specified. This is because the IDtag is required by the SAM Spec. Specify --rg multiple times to set multiple fields. See the SAM Spec for details about what fields are legal.

--omit-sec-seq

When printing secondary alignments, Bowtie 2 by default will write out the SEQ and QUAL strings. Specifying this option causes Bowtie 2 to print an asterix in those fields instead.

Performance options

-o/--offrate

Override the offrate of the index with . If  is greater than the offrate used to build the index, then some row markings are discarded when the index is read into memory. This reduces the memory footprint of the aligner but requires more time to calculate text offsets. must be greater than the value used to build the index.

-p/--threads NTHREADS

Launch NTHREADS parallel search threads (default: 1). Threads will run on separate processors/cores and synchronize when parsing reads and outputting alignments. Searching for alignments is highly parallel, and speedup is close to linear. Increasing -p increases Bowtie 2's memory footprint. E.g. when aligning to a human genome index, increasing -p from 1 to 8 increases the memory footprint by a few hundred megabytes. This option is only available if bowtie is linked with thepthreads library (i.e. if BOWTIE_PTHREADS=0 is not specified at build time).

--reorder

Guarantees that output SAM records are printed in an order corresponding to the order of the reads in the original input file, even when -p is set greater than 1. Specifying --reorder and setting -p greater than 1 causes Bowtie 2 to run somewhat slower and use somewhat more memory then if --reorder were not specified. Has no effect if -p is set to 1, since output order will naturally correspond to input order in that case.

--mm

Use memory-mapped I/O to load the index, rather than typical file I/O. Memory-mapping allows many concurrentbowtie processes on the same computer to share the same memory image of the index (i.e. you pay the memory overhead just once). This facilitates memory-efficient parallelization of bowtie in situations where using -p is not possible or not preferable.

Other options

--qc-filter

Filter out reads for which the QSEQ filter field is non-zero. Only has an effect when read format is --qseq. Default: off.

--seed

Use  as the seed for pseudo-random number generator. Default: 0.

--non-deterministic

Normally, Bowtie 2 re-initializes its pseudo-random generator for each read. It seeds the generator with a number derived from (a) the read name, (b) the nucleotide sequence, (c) the quality sequence, (d) the value of the --seed option. This means that if two reads are identical (same name, same nucleotides, same qualities) Bowtie 2 will find and report the same alignment(s) for both, even if there was ambiguity. When --non-deterministic is specified, Bowtie 2 re-initializes its pseudo-random generator for each read using the current time. This means that Bowtie 2 will not necessarily report the same alignment for two identical reads. This is counter-intuitive for some users, but might be more appropriate in situations where the input consists of many identical reads.

--version

Print version information and quit.

-h/--help

Print usage information and quit.

 

0

阅读 收藏 喜欢 打印举报/Report
后一篇:BWA比对
  

新浪BLOG意见反馈留言板 欢迎批评指正

新浪简介 | About Sina | 广告服务 | 联系我们 | 招聘信息 | 网站律师 | SINA English | 产品答疑

新浪公司 版权所有