基于第二代测序平台的广泛使用使得物种基因组的测序工作有了突飞猛进的发展。目前,华大基因研究院新开发了一种基于短片段组装的软件SOAP,现在把使用方法介绍给给位需要的朋友们。
Usage: soap
[options]
-a
<str>
query a file, *.fq or *.fa format
-d
<str>
reference sequences file, *.fa format
-o
<str>
output alignment file
-s
<int>
seed size, default=10. [read>18,s=8;
read>22,s=10, read>26,
s=12]
-v
<int>
maximum number of mismatches allowed on a read, <=5.
default=2bp. For pair-ended alignment, this version will allow either
0 or 2 mismatches.
-g <int>
maximum gap size allowed on a read, default=0bp
-w
<int>
maximum number of equal best hits to count, smaller will be faster,
<=10000
-e
<int>
will not allow gap exist inside n-bp edge of a read,
default=5bp
-z
<char> initial quality,
default=@ [Illumina is using '@', Sanger Institute is using
'!']
-c
<int> how
to trim low-quality at 3-end?
0:
don't trim;
1-10: trim
n-bps at 3-end for all reads;
11-20: trim first bp and (n-10)-bp at 3-end for all
reads;
21-30: trim (n-20)-bp at 3-end and redo alignment if the original
read have no hit;
31-40: trim first bp and (n-30)-bp at 3-end and redo alignment if
the original read have no hit;
41-50: iteratively trim (n-40)-bp at 3-end until getting
hits;
51-60: if no hit, trim first bp and iteratively trim (n-50)bp at
3-end until getting hits;
default: 0
-f
<int>
filter low-quality reads containing >n Ns,
default=5
-r [0,1,2]
how to report repeat hits, 0=none; 1=random one; 2=all,
default=1
-t
read ID in output file, [name, order in input file], default:
name
-n
<int> do
alignment on which reference chain? 0:both; 1:forward only;
2:reverse only. default=0
-p
<int>
number of processors to use, default=1
Options for pair-end alignment:
-b
<str>
query b file
-m
<int>
minimal insert size allowed, default=400
-x
<int>
maximal insert size allowed, default=600
-2
<str>
output file of unpaired alignment hits
-y
do not optimize for SV analysis, default will output hit a and hit
b with smallest distance in unpaired alignment
Options for mRNA tag alignment:
-T
<int>
type of tag, 0:DpnII, GATC+16; 1:NlaIII, CATG+17. default=-1[not
mRNA tag]
Options for miRNA alignment:
-A
<str>
3-end adapter sequence, default=[not miRNA]
-S
<int>
number of mismatch allowed in adapter, default=0
-M
<int>
minimum length of a miRNA, default=17
-X
<int>
maximum length of a miRNA, default=26
-h
help
Command lines:
single-end alignment: soap -a query.fa -d ref.fa -o out.sop -s
12
pair-end alignment:
soap -a query_1.fa -b query_2.fa -d ref.fa -o out.sop -2 single.sop
-m 100 -x 150
batch model:
soap -d ref.fa <parameter
file>
SOAP provides batch model for alignment of multiple query datasets
onto the same reference, which will avoid loading reference
and constructing indexing hash for multiple times. the
<parameter file> contains options for
each query:
<parameter file>:
-a q1.fa -o out1.sop -s 12
-a q2.fa -o out2.sop -s 12
...
-a qn.fa -o outn.sop -s 10
Note: location coordinates are counted from
1
Setting seed size:
seed_size*2+3<=min_read_size, AND
seed_size<=12
Output format:
id, seq, qual, #_of_hits, a/b[belonging to query a or b if
pair-end], length, +/-, ref, ref_location, type
Types:
0:
exact match;
n OffsetAlleleQual: n mismatches with offset, allele and quality,
ex: 1 C->10T30, 1-bp mismatch at location+10 on ref,
ref allele C, query allele T, query quality 30;
100+n Offset:
n-bp insertion on query, ex: 101 15, 1-bp insertion on query, start
after 15bp on ref
200+n Offset:
n-bp deletion on query, ex: 201 16, 1-bp deletion on query, start
after 16bp on ref
文章来源:http://blog.163.com/henry_by/blog/static/5726535820100288482842/