http://blog.sina.com.cn/u/2214034580

首页博文目录关于我

个人资料

微博

加好友发纸条

写留言加关注

博客等级：
博客积分：

博客访问：
关注人气：
获赠金笔：0支
赠出金笔：0支
荣誉徽章：

正文字体大小：大中小

ARC基于线粒体的测序组装软件

(2015-09-01 19:13:20)

分类：软件安装

1:软件名称：

Assembly by Reduced Complexity (ARC)，软件网址：http://ibest.github.io/ARC/#Contact

2：参考文章：

Assembly by Reduced Complexity (ARC): a hybrid approach for targeted assembly of homologous sequences.(http://biorxiv.org/content/early/2015/02/07/014662)

3:原理：将测序数据比对参考线粒体基因组上，在线粒体基因组上划分bin,然后基于bin进行组装，最后合并组装的结果。类似的软件MITObim文章发表Nucleic Acids Research可参考：https://github.com/chrishah/MITObim

4:软件运行：

/share/nas2/genome/biosoft/Python/2.7.8/bin/ARC -c ARC_config.txt

其中配置文件如下：

## Name=value pairs:

## reference: contains reference sequences in fasta format

## numcycles: maximum number of times to try remapping

## mapper: the mapper to use (blat/bowtie2)

## assembler: the assembler to use (newbler/spades)

## nprocs: number of cores to use

## format: fastq or fastq, all must be the same

## verbose: control mapping/assembly log generation (True/False)

## urt: For Newbler, enable use read tips mode (True/False)

## map_against_reads: On iteration 1, skip assembly, map against mapped reads (True/False)

## assemblytimeout: kill assemblies and discard targets if they take longer than N minutes

## Columns:

## Sample_ID:Sample_ID

## FileName: path for fasta/fastq file

## FileType: PE1, PE2, or SE

## FileFormat: fasta or fastq

# reference=/share/nas29/shim/testing/ARC/data/targets.fa

# numcycles=10

# mapper=bowtie2

# assembler=spades

# nprocs=7

# format=fastq

# verbose=True

# urt=True

# map_against_reads=False

# assemblytimeout=300

# bowtie2_k=3

# rip=True

# cdna=False

# subsample=1

# maskrepeats=True

# sloppymapping=True

Sample_ID FileName FileType

Sample1 /share/nas29/shim/testing/ARC/data/reads/Lampyridae_S99-D01-I_good_1.fq PE1

Sample1 /share/nas29/shim/testing/ARC/data/reads/Lampyridae_S99-D01-I_good_2.fq PE2

5:配置文件标红的分别选择的组装软件这里选择是spades,当然这个软件要在你的环境变量里面，assemblytimeout=300，代表的是拼接超过多少分钟就杀死，因此这里设置的时间较长（10分钟）。原作者提供的配置文件中是1分钟，网站上默认设置为10. 循环数这个一般也不需要10次，个人经验3-4次就可以# numcycles=10

如果你的参考基因组与组装的较远请设置参数map_against_reads为True,如果数据深度较深可以抽样组装例如：subsample＝0.4默认为1

阅读┊ 收藏 ┊ 喜欢 ▼ ┊打印┊举报/Report

前一篇：微生物多样性分析与关联分析

后一篇：关于eggNOG数据库的注释(2016.12.22)

新浪BLOG意见反馈留言板　欢迎批评指正