加载中…
个人资料
Krebs
Krebs
  • 博客等级:
  • 博客积分:0
  • 博客访问:2,426
  • 关注人气:95
  • 获赠金笔:0支
  • 赠出金笔:0支
  • 荣誉徽章:
正文 字体大小:

华大基因soap软件的使用方法

(2011-05-12 14:18:45)
标签:

soap

使用说明

拼接

杂谈

基于第二代测序平台的广泛使用使得物种基因组的测序工作有了突飞猛进的发展。目前,华大基因研究院新开发了一种基于短片段组装的软件SOAP,现在把使用方法介绍给给位需要的朋友们。

引用henry_by 的 soap使用方法

Usage:  soap [options]

       -a  <str>   query a file, *.fq or *.fa format

       -d  <str>   reference sequences file, *.fa format

       -o  <str>   output alignment file

       -s  <int>   seed size, default=10. [read>18,s=8; read>22,s=10, read>26, s=12]

       -v  <int>   maximum number of mismatches allowed on a read, <=5. default=2bp. For pair-ended alignment, this version will allow either 0 or 2 mismatches.

       -g  <int>   maximum gap size allowed on a read, default=0bp

       -w  <int>   maximum number of equal best hits to count, smaller will be faster, <=10000

       -e  <int>   will not allow gap exist inside n-bp edge of a read, default=5bp

       -z  <char>  initial quality, default=@ [Illumina is using '@', Sanger Institute is using '!']

       -c  <int>   how to trim low-quality at 3-end?

                   0:     don't trim;

                   1-10:  trim n-bps at 3-end for all reads;

                   11-20: trim first bp and (n-10)-bp at 3-end for all reads;

                   21-30: trim (n-20)-bp at 3-end and redo alignment if the original read have no hit;

                   31-40: trim first bp and (n-30)-bp at 3-end and redo alignment if the original read have no hit;

                   41-50: iteratively trim (n-40)-bp at 3-end until getting hits;

                   51-60: if no hit, trim first bp and iteratively trim (n-50)bp at 3-end until getting hits;

                   default: 0

       -f  <int>   filter low-quality reads containing >n Ns, default=5

       -r  [0,1,2] how to report repeat hits, 0=none; 1=random one; 2=all, default=1

       -t          read ID in output file, [name, order in input file], default: name

       -n  <int>   do alignment on which reference chain? 0:both; 1:forward only; 2:reverse only. default=0

       -p  <int>   number of processors to use, default=1

 

  Options for pair-end alignment:

       -b  <str>   query b file

       -m  <int>   minimal insert size allowed, default=400

       -x  <int>   maximal insert size allowed, default=600

       -2  <str>   output file of unpaired alignment hits

       -y          do not optimize for SV analysis, default will output hit a and hit b with smallest distance in unpaired alignment

 

  Options for mRNA tag alignment:

       -T  <int>   type of tag, 0:DpnII, GATC+16; 1:NlaIII, CATG+17. default=-1[not mRNA tag]

 

  Options for miRNA alignment:

       -A  <str>   3-end adapter sequence, default=[not miRNA]

       -S  <int>   number of mismatch allowed in adapter, default=0

       -M  <int>   minimum length of a miRNA, default=17

       -X  <int>   maximum length of a miRNA, default=26

       -h          help

 

Command lines:

   single-end alignment: soap -a query.fa -d ref.fa -o out.sop -s 12

   pair-end alignment:   soap -a query_1.fa -b query_2.fa -d ref.fa -o out.sop -2 single.sop -m 100 -x 150

   batch model:          soap -d ref.fa <parameter file>

 

      SOAP provides batch model for alignment of multiple query datasets onto the same reference, which will avoid loading reference and constructing indexing hash for multiple times. the <parameter file> contains options for each query:

      <parameter file>:

      -a q1.fa -o out1.sop -s 12

      -a q2.fa -o out2.sop -s 12

      ...

      -a qn.fa -o outn.sop -s 10

 

Note: location coordinates are counted from 1

 

Setting seed size: seed_size*2+3<=min_read_size, AND seed_size<=12

 

Output format:

  id, seq, qual, #_of_hits, a/b[belonging to query a or b if pair-end], length, +/-, ref, ref_location, type

 

  Types:

    0:                  exact match;

    n OffsetAlleleQual: n mismatches with offset, allele and quality, ex: 1 C->10T30, 1-bp mismatch at location+10 on ref, ref allele C, query allele T, query quality 30;

    100+n Offset:       n-bp insertion on query, ex: 101 15, 1-bp insertion on query, start after 15bp on ref

    200+n Offset:       n-bp deletion on query, ex: 201 16, 1-bp deletion on query, start after 16bp on ref

文章来源:http://blog.163.com/henry_by/blog/static/5726535820100288482842/


0

阅读 收藏 喜欢 打印举报/Report
  

新浪BLOG意见反馈留言板 欢迎批评指正

新浪简介 | About Sina | 广告服务 | 联系我们 | 招聘信息 | 网站律师 | SINA English | 产品答疑

新浪公司 版权所有