华大基因soap软件的使用方法_Krebs

http://blog.sina.com.cn/u/1562304362

首页博文目录关于我

个人资料

微博

加好友发纸条

写留言加关注

博客等级：
博客积分：

博客访问：
关注人气：
获赠金笔：0支
赠出金笔：0支
荣誉徽章：

正文字体大小：大中小

华大基因soap软件的使用方法

(2011-05-12 14:18:45)

标签：

soap

使用说明

拼接

杂谈

基于第二代测序平台的广泛使用使得物种基因组的测序工作有了突飞猛进的发展。目前，华大基因研究院新开发了一种基于短片段组装的软件SOAP，现在把使用方法介绍给给位需要的朋友们。

引用henry_by 的 soap使用方法

Usage: soap [options]

-a <str> query a file, *.fq or *.fa format

-d <str> reference sequences file, *.fa format

-o <str> output alignment file

-s <int> seed size, default=10. [read>18,s=8; read>22,s=10, read>26, s=12]

-v <int> maximum number of mismatches allowed on a read, <=5. default=2bp. For pair-ended alignment, this version will allow either 0 or 2 mismatches.

-g <int> maximum gap size allowed on a read, default=0bp

-w <int> maximum number of equal best hits to count, smaller will be faster, <=10000

-e <int> will not allow gap exist inside n-bp edge of a read, default=5bp

-z <char> initial quality, default=@ [Illumina is using '@', Sanger Institute is using '!']

-c <int> how to trim low-quality at 3-end?

0: don't trim;

1-10: trim n-bps at 3-end for all reads;

11-20: trim first bp and (n-10)-bp at 3-end for all reads;

21-30: trim (n-20)-bp at 3-end and redo alignment if the original read have no hit;

31-40: trim first bp and (n-30)-bp at 3-end and redo alignment if the original read have no hit;

41-50: iteratively trim (n-40)-bp at 3-end until getting hits;

51-60: if no hit, trim first bp and iteratively trim (n-50)bp at 3-end until getting hits;

default: 0

-f <int> filter low-quality reads containing >n Ns, default=5

-r [0,1,2] how to report repeat hits, 0=none; 1=random one; 2=all, default=1

-t read ID in output file, [name, order in input file], default: name

-n <int> do alignment on which reference chain? 0:both; 1:forward only; 2:reverse only. default=0

-p <int> number of processors to use, default=1

Options for pair-end alignment:

-b <str> query b file

-m <int> minimal insert size allowed, default=400

-x <int> maximal insert size allowed, default=600

-2 <str> output file of unpaired alignment hits

-y do not optimize for SV analysis, default will output hit a and hit b with smallest distance in unpaired alignment

Options for mRNA tag alignment:

-T <int> type of tag, 0:DpnII, GATC+16; 1:NlaIII, CATG+17. default=-1[not mRNA tag]

Options for miRNA alignment:

-A <str> 3-end adapter sequence, default=[not miRNA]

-S <int> number of mismatch allowed in adapter, default=0

-M <int> minimum length of a miRNA, default=17

-X <int> maximum length of a miRNA, default=26

-h help

Command lines:

single-end alignment: soap -a query.fa -d ref.fa -o out.sop -s 12

pair-end alignment: soap -a query_1.fa -b query_2.fa -d ref.fa -o out.sop -2 single.sop -m 100 -x 150

batch model: soap -d ref.fa <parameter file>

SOAP provides batch model for alignment of multiple query datasets onto the same reference, which will avoid loading reference and constructing indexing hash for multiple times. the <parameter file> contains options for each query:

<parameter file>:

-a q1.fa -o out1.sop -s 12

-a q2.fa -o out2.sop -s 12

...

-a qn.fa -o outn.sop -s 10

Note: location coordinates are counted from 1

Setting seed size: seed_size*2+3<=min_read_size, AND seed_size<=12

Output format:

id, seq, qual, #_of_hits, a/b[belonging to query a or b if pair-end], length, +/-, ref, ref_location, type

Types:

0: exact match;

n OffsetAlleleQual: n mismatches with offset, allele and quality, ex: 1 C->10T30, 1-bp mismatch at location+10 on ref, ref allele C, query allele T, query quality 30;

100+n Offset: n-bp insertion on query, ex: 101 15, 1-bp insertion on query, start after 15bp on ref

200+n Offset: n-bp deletion on query, ex: 201 16, 1-bp deletion on query, start after 16bp on ref

文章来源：http://blog.163.com/henry_by/blog/static/5726535820100288482842/

阅读┊ 收藏 ┊ 喜欢 ▼ ┊打印┊举报/Report

前一篇：Velvet Download+Manual！

后一篇：Tips for de novo bacterial genome assembly

新浪BLOG意见反馈留言板　欢迎批评指正