用mothur进行OTU分类_Ochi

http://blog.sina.com.cn/u/2153197405

首页博文目录关于我

个人资料

微博

加好友发纸条

写留言加关注

博客等级：
博客积分：

博客访问：
关注人气：
获赠金笔：0支
赠出金笔：0支
荣誉徽章：

正文字体大小：大中小

用mothur进行OTU分类

(2013-03-30 21:00:44)

标签：

mothur

otu

序列

dna

分类

分类：资料

mothur 用来处理高通量测序结果或者克隆文库的序列处理

这里进行OTU分类，就是排除克隆测序结果的重复序列，以97%的相似度进行分类

即cutoff=0.03

mothur v.1.25.1
Last updated: 5/14/2012

by
Patrick D. Schloss

Department of Microbiology & Immunology
University of Michigan
pschloss@umich.edu
http://www.mothur.org

When using, please cite:
Schloss, P.D., et al., Introducing mothur: Open-source, platform-independent, co
mmunity-supported software for describing and comparing microbial communities. A
ppl Environ Microbiol, 2009. 75(23):7537-41.

Distributed under the GNU General Public License

Type 'help()' for information on the commands that are available

Type 'quit()' to exit program

mothur > unique.seqs(fasta=*.fasta)

5** 2**

Output File Names:
*.unique.fasta
*.names

mothur > align.seqs(candidate=*.unique.fasta,template=tmp.fasta,flip=T,processors=2)

Reading in the tmp.fasta template sequences... DONE.
It took 0 to read 1 sequences.
Aligning sequences from *.unique.fasta ...

Reading in the tmp.fasta template sequences... DONE.
It took 0 to read 1 sequences.
***
***

Some of your sequences generated alignments that eliminated too many bases, a list is provided in *.unique.flip.accnos. If the reverse compliment proved to be better it was reported.
It took 1 secs to align *** sequences.

Output File Names:
*.unique.align
*.unique.align.report
*.unique.flip.accnos

mothur > filter.seqs(fasta=*.unique.align)

Length of filtered alignment: ***

Number of columns removed: 0

Length of the original alignment: ***

Number of sequences used to construct filter: ***

Output File Names:
*.filter
*.unique.filter.fasta

mothur > dist.seqs(fasta=*.unique.filter.fasta,calc=onegap,countends=F,cutoff=0.03,output=lt)

Output File Name:
g5gta.unique.filter.phylip.dist

It took 0 to calculate the distances for *** sequences.

mothur > cluster(phylip=*.unique.filter.phylip.dist,method=furthest,cutoff=0.03)

********************#****#****#****#****#****#****#****#****#****#****#
   Reading matrix:     ||||||||||||||||||||||||||||||||||||||||||||||||||||
    ***********************************************************************
unique 2       124     2
0.01    3       89      15      3
0.02    5       69      7       10      0       3
0.03    8       60      7       7       3       1       0       0       2

Output File Names:
*.unique.filter.phylip.fn.sabund
*.unique.filter.phylip.fn.rabund
*.unique.filter.phylip.fn.list

It took 0 seconds to cluster

mothur > bin.seqs(fasta=*.fasta,name=*.names)

Using *.unique.filter.phylip.fn.list as input file for the list parameter.
unique
0.01
0.02
0.03

Output File Names:
*.unique.filter.phylip.fn.unique.fasta
*.unique.filter.phylip.fn.0.01.fasta
*.unique.filter.phylip.fn.0.02.fasta
*.unique.filter.phylip.fn.0.03.fasta

mothur > get.oturep(phylip=*.unique.filter.phylip.dist,fasta=*.fasta,list=*.unique.filter.phylip.fn.list,label=0.03)

Output File Names:
*.unique.filter.phylip.fn.0.03.rep.names
*.unique.filter.phylip.fn.0.03.rep.fasta

mothur >

mothur下载地址
http://www.mothur.org/wiki/Download_mothur

支持Mac，Windows以及Linux操作系统

首先要打开mothur -----cmd，进入命令行界面cd xxx 进入mothur.exe目录，输入mothur.exe 回车
假设该目录内fasta格式文件叫做 Great.fasta

那么使用如下命令处理：

1.mothur > unique.seqs(fasta=Great.fasta)
2.mothur > dist.seqs(fasta=Great.unique.fasta,calc=onegap,countends=F,cutoff=0.03,output=lt)
3.mothur > cluster(phylip=Great.unique.phylip.dist,method=furthest,cutoff=0.03)
4.mothur > bin.seqs(fasta=Great.fasta,name=Great.names)
5.mothur > get.oturep(phylip=Great.unique.phylip.dist,fasta=Great.unique.fasta,list=Great.unique.

phylip.fn.list,label=0.03)

最后打开Great.phylip.fn.0.03.rep.fasta 查看各OTU 的数目即可，即每个序列名最后的数字。
另外，还可以看到有多少个OTUs
Great.unique.phylip.fn.0.03.rep.names 这个文件显示了每个OTU都有哪些序列，可以用记事本打开

Great.unique.phylip.fn.list 这个文件显示了unique_0.01_0.02_0.03 分别有哪些类型的OTU,每一组用空格或Tab隔开，每组OTU内的各序列用"，" 隔开。

注意，cluster命令能读取phylip矩阵，也能读取column矩阵，如果读取的是column，还需要提供一个names文件
另外，用unique.seqs命令,是为了生成names文件

如果在运行第1步的时候，提示[ERROR]:your sequences are not the same length, aborting.

那么需要运行一下命令：

先提供一个template fasta文件，以以上fasta文件中序列长度最普遍的某个为序列模板，新建一个fasta，命名为temp.fasta

a. mothur > unique.seqs(fasta=Great.fasta)

b. mothur > align.seqs(candidate=Great.unique.fasta,template=temp.fasta,flip=T,processors=2)

c. mothur > filter.seqs(fasta=Great.unique.align)

再把这里生成的Great.unique.filter.fasta文件更名为Great.fasta 进行以上5步处理。（尽量在一个新的mothur目录内，打开一个新的mothur窗口，以免文件名冲突）

阅读┊ 收藏 ┊ 喜欢 ▼ ┊打印┊举报/Report

前一篇：“学习”这个事情

后一篇：R 编辑数据或读取数据

新浪BLOG意见反馈留言板　欢迎批评指正