基于16s rRNA序列来预测功能(PICRUSt使用说明)
(2015-12-04 22:12:47)| 分类: 微生物 |
1:目前基于16s rRNA 序列可以预测功能的软件有:
Tax4Fun(http://tax4fun.gobics.de)与PICRUSt(https://picrust.github.io/picrust/index.html)
2015-Tax4Fun predicting functional profiles from metagenomic 16S rRNA data
2013-Predictive functional profiling of microbial communities using 16S rRNA marker gene sequences
2:PICRUSt软件的使用说明如下:
2-1:输入来自qiime软件的biom格式的OTU_table,其使用的流程如下:
https://picrust.github.io/picrust/tutorials/otu_picking.html#otu-picking-tutorial
https://picrust.github.io/picrust/tutorials/metagenome_prediction.html#metagenome-prediction-tutorial
但是需要注意的是你用于分类的参考数据库必须是greengene,但是目前greengene数据库更新速度没有SILVA快,因此如果选择SILVA作为pick_otu时候的数据库,则无法使用PICRUSt软件来预测功能。
3:下面是一个可以处理SILVA的脚本,需要输入两个重要文件:
a:使用qiime输出的OTU表格,注意在用qiime的标准流程跑完以后生成的biom文件,需要进一步对OTU进行过滤,过滤的标准可以参考文献(Quality-filtering
vastly improves diversity estimates from Illumina amplicon
sequencing),对OTU过滤的标准就是一个OTU出现的次数最少为,总的测序数据的:c
b:OTU 代表序列
脚本如下:
#!/usr/bin/perl use strict;use warnings;use Cwd;use Getopt::Long;my $outdir ||=getcwd;my ($table,$blast,$fasta);my $greengene ||="/share/nas21/fanyc/gg_data/gg";#this my $blastcmd="/share/nas2/genome/bin/blastn";my $picust="/share/nas2/genome/biosoft/picrust/1.0.0/build/scripts-2.7";my $biom="/share/nas2/genome/biosoft/Python/2.7.8/bin/biom";my $identify ||=97;GetOptions( "o:s"=>\$outdir, "otu:s"=>\$table, "rep:s"=>\$fasta, "blast:s"=>\$blastcmd, "gg:s"=>\$greengene, "p:s"=>\$identify );sub usage{print qq{This use picust The The perl $0 -otu $outdir options:-otu -rep -gg -blast -identify -o Email:fanyucai1\@126.com2015.12.4}; exit; }if (!$table || $fasta ){ &usage();}open(IN1,$fasta);my %hash;my $seqname;while (){ chomp; if ($_=~/\>/) { my @array=split; $seqname=substr($array[0],1); } else { $hash{$seqname}.=$_; } }close IN1;open(IN2,$table) open (OUT2,">$outdir/new_rep.fasta");while () chomp; if ($_ !~/#/) { chomp; my @array=split("\t",$_); if ($hash{$array[0]}) { print OUT2 ">",$array[0],"\n",$hash{$array[0]},"\n"; } } }close IN2;system "$blastcmd ;open(IN3,"$outdir/blast.out");my %map;my %map2;while (){ chomp; my @array=split; #array[0] #array[1] #array[2] if ($array[2]>=$identify) { if (exists $map2{$array[1]}) { if ($array[2]>$map2{$array[1]}) { $map{$array[1]}=$array[0]; $map2{$array[1]}=$array[2]; } } else { $map{$array[1]}=$array[0]; $map2{$array[1]}=$array[2]; } } }close IN3;open(IN4,$table);open(OUT,">$outdir/new_otu_table_no_singletons.txt");open(MAP,">$outdir/OTU2greengee.txt");TT:while (){ chomp; if ($_ !~/#/) { my @array=split("\t",$_); foreach my $key (keys %map) { if ($map{$key} $array[0]) { print OUT $key,"\t"; print MAP $key,"\t",$map{$key},"\n"; for(my $k=1;$k<=$#array-1;$k++) { print OUT $array[$k],"\t"; } print OUT $array[$#array],"\n"; next TT; } } } else { print OUT $_,"\n"; }}system "$biom ;system "$picust/normalize_by_copy_number.py ;system "$picust/predict_metagenomes.py ;system "$picust/predict_metagenomes.py ;system "$picust/predict_metagenomes.py ;system "$picust/predict_metagenomes.py ; |

加载中…