ARC基于线粒体的测序组装软件

分类: 软件安装 |
1:软件名称:
Assembly by Reduced Complexity (ARC),软件网址:http://ibest.github.io/ARC/#Contact
2:参考文章:
Assembly by Reduced Complexity (ARC): a hybrid approach for targeted assembly of homologous sequences.(http://biorxiv.org/content/early/2015/02/07/014662)
3:原理:将测序数据比对参考线粒体基因组上,在线粒体基因组上划分bin,然后基于bin进行组装,最后合并组装的结果。类似的软件MITObim文章发表Nucleic
Acids Research可参考:https://github.com/chrishah/MITObim
4:软件运行:
/share/nas2/genome/biosoft/Python/2.7.8/bin/ARC
-c ARC_config.txt
其中配置文件如下:
## Name=value pairs:
## reference: contains reference
sequences in fasta format
## numcycles: maximum number of
times to try remapping
## mapper: the mapper to use
(blat/bowtie2)
## assembler: the assembler to
use (newbler/spades)
## nprocs: number of cores to
use
## format: fastq or fastq, all
must be the same
## verbose: control
mapping/assembly log generation (True/False)
## urt: For Newbler, enable use
read tips mode (True/False)
## map_against_reads: On
iteration 1, skip assembly, map against mapped reads
(True/False)
## assemblytimeout: kill
assemblies and discard targets if they take longer than N
minutes
##
## Columns:
##
Sample_ID:Sample_ID
## FileName: path for fasta/fastq
file
## FileType: PE1, PE2, or
SE
## FileFormat: fasta or
fastq
#
reference=/share/nas29/shim/testing/ARC/data/targets.fa
#
numcycles=10
# mapper=bowtie2
#
assembler=spades
# nprocs=7
# format=fastq
# verbose=True
# urt=True
#
map_against_reads=False
#
assemblytimeout=300
# bowtie2_k=3
# rip=True
# cdna=False
# subsample=1
# maskrepeats=True
# sloppymapping=True
Sample_ID FileName
FileType
Sample1
/share/nas29/shim/testing/ARC/data/reads/Lampyridae_S99-D01-I_good_1.fq
PE1
Sample1
/share/nas29/shim/testing/ARC/data/reads/Lampyridae_S99-D01-I_good_2.fq
PE2
5:配置文件标红的分别选择的组装软件这里选择是spades,当然这个软件要在你的环境变量里面,assemblytimeout=300,代表的是拼接超过多少分钟就杀死,因此这里设置的时间较长(10分钟)。原作者提供的配置文件中是1分钟,网站上默认设置为10.
循环数这个一般也不需要10次,个人经验3-4次就可以 #
numcycles=10
如果你的参考基因组与组装的较远请设置参数map_against_reads为True,如果数据深度较深可以抽样组装例如:subsample=0.4默认为1