基于16s rRNA序列来预测功能（PICRUSt使用说明)_fanyucai

http://blog.sina.com.cn/u/2214034580

首页博文目录关于我

个人资料

微博

加好友发纸条

写留言加关注

博客等级：
博客积分：

博客访问：
关注人气：
获赠金笔：0支
赠出金笔：0支
荣誉徽章：

正文字体大小：大中小

基于16s rRNA序列来预测功能（PICRUSt使用说明)

(2015-12-04 22:12:47)

分类：微生物

1:目前基于16s rRNA 序列可以预测功能的软件有:

Tax4Fun(http://tax4fun.gobics.de)与PICRUSt（https://picrust.github.io/picrust/index.html）

2015-Tax4Fun predicting functional profiles from metagenomic 16S rRNA data

2013-Predictive functional profiling of microbial communities using 16S rRNA marker gene sequences

2:PICRUSt软件的使用说明如下：

2-1:输入来自qiime软件的biom格式的OTU_table,其使用的流程如下：

https://picrust.github.io/picrust/tutorials/otu_picking.html#otu-picking-tutorial

https://picrust.github.io/picrust/tutorials/metagenome_prediction.html#metagenome-prediction-tutorial

但是需要注意的是你用于分类的参考数据库必须是greengene,但是目前greengene数据库更新速度没有SILVA快，因此如果选择SILVA作为pick_otu时候的数据库，则无法使用PICRUSt软件来预测功能。

3:下面是一个可以处理SILVA的脚本，需要输入两个重要文件：

a:使用qiime输出的OTU表格，注意在用qiime的标准流程跑完以后生成的biom文件，需要进一步对OTU进行过滤，过滤的标准可以参考文献（Quality-filtering vastly improves diversity estimates from Illumina amplicon sequencing），对OTU过滤的标准就是一个OTU出现的次数最少为，总的测序数据的：c = 0.005%。过滤的到的OTU（biom）格式的文件，转化成txt文本格式（http://biom-format.org/documentation/biom_conversion.html）

b:OTU 代表序列

脚本如下：

#!/usr/bin/perl -w

use strict;

use warnings;

use Cwd;

use Getopt::Long;

my $outdir ||=getcwd;

my ($table,$blast,$fasta);

my $greengene ||="/share/nas21/fanyc/gg_data/gg";#this database from makeblastdb greengene

my $blastcmd="/share/nas2/genome/bin/blastn";

my $picust="/share/nas2/genome/biosoft/picrust/1.0.0/build/scripts-2.7";

my $biom="/share/nas2/genome/biosoft/Python/2.7.8/bin/biom";

my $identify ||=97;

GetOptions(

"o:s"=>\$outdir,

"otu:s"=>\$table,

"rep:s"=>\$fasta,

"blast:s"=>\$blastcmd,

"gg:s"=>\$greengene,

"p:s"=>\$identify

);

sub usage{

print qq{

This script was used to convert denovoID ( OTU_table) to greengene ID and use picust to predit function using 16s rRNA.

The KEGG and COG result named:predictions_cog.tab && predictions_kegg.tab.

The analysis was present using tab and biom.

perl $0 -otu otu_table_no_singletons.txt -rep qiime_rep_set.fasta -o $outdir

options:

-otu the otu_table from qiime (otu_table_no_singletons.txt)(force)

-rep the representative sequence every otu(force)

-gg the prefix of database from makeblastdb greengene(default:/share/nas21/fanyc/gg_data/gg)

-blast the directory of blastn(default:/share/nas2/genome/bin/blastn)

-identify the identify threshold between denovo OTU ID and greengene

-o the out directory

Email:fanyucai1\@126.com

2015.12.4

};

exit;

}

if (!$table || !$fasta )

{

&usage();

}

open(IN1,$fasta);

my %hash;

my $seqname;

while ()

{

chomp;

if ($_=~/\>/)

{

my @array=split;

$seqname=substr($array[0],1);

}

else

{

$hash{$seqname}.=$_;

}

close IN1;

open(IN2,$table) ;

open (OUT2,">$outdir/new_rep.fasta");

while () {

chomp;

if ($_ !~/#/)

{

chomp;

my @array=split("\t",$_);

if ($hash{$array[0]})

{

print OUT2 ">",$array[0],"\n",$hash{$array[0]},"\n";

}

close IN2;

system "$blastcmd -num_alignments 1 -query $outdir/new_rep.fasta -db $greengene -out $outdir/blast.out -outfmt 6";

open(IN3,"$outdir/blast.out");

my %map;

my %map2;

while ()

{

chomp;

my @array=split;

#array[0] is OTU_ID

#array[1] is greengene ID

#array[2] is identify

if ($array[2]>=$identify)

{

if (exists $map2{$array[1]})

{

if ($array[2]>$map2{$array[1]})

{

$map{$array[1]}=$array[0];

$map2{$array[1]}=$array[2];

}

else

{

$map{$array[1]}=$array[0];

$map2{$array[1]}=$array[2];

}

close IN3;

open(IN4,$table);

open(OUT,">$outdir/new_otu_table_no_singletons.txt");

open(MAP,">$outdir/OTU2greengee.txt");

TT:while ()

{

chomp;

if ($_ !~/#/)

{

my @array=split("\t",$_);

foreach my $key (keys %map)

{

if ($map{$key} eq $array[0])

{

print OUT $key,"\t";

print MAP $key,"\t",$map{$key},"\n";

for(my $k=1;$k<=$#array-1;$k++)

{

print OUT $array[$k],"\t";

}

print OUT $array[$#array],"\n";

next TT;

}

else

{

print OUT $_,"\n";

}

system

"$biom convert -i $outdir/new_otu_table_no_singletons.txt -o $outdir/new_otu_table.biom --table-type=\"otu table\" --process-obs-metadata taxonomy"

;

system "$picust/normalize_by_copy_number.py -i $outdir/new_otu_table.biom -o $outdir/normalized_otus.biom";

system "$picust/predict_metagenomes.py -f -i $outdir/normalized_otus.biom -o $outdir/predictions_kegg.tab";

system "$picust/predict_metagenomes.py -i $outdir/normalized_otus.biom -o $outdir/predictions_kegg.biom";

system

"$picust/predict_metagenomes.py -f --type_of_prediction cog -i $outdir/normalized_otus.biom -o $outdir/predictions_cog.tab"

;

system

"$picust/predict_metagenomes.py --type_of_prediction cog -i $outdir/normalized_otus.biom -o $outdir/predictions_cog.biom"

;

阅读┊ 收藏 ┊ 喜欢 ▼ ┊打印┊举报/Report

前一篇：发现宏基因组测序中（低丰度的微生物）

后一篇：平均核苷酸一致性average nucleotide identity (ANI)

新浪BLOG意见反馈留言板　欢迎批评指正