基于16s rRNA序列来预测功能(PICRUSt使用说明)
(2015-12-04 22:12:47)分类: 微生物 |
1:目前基于16s rRNA 序列可以预测功能的软件有:
Tax4Fun(http://tax4fun.gobics.de)与PICRUSt(https://picrust.github.io/picrust/index.html)
2015-Tax4Fun predicting functional profiles from metagenomic 16S rRNA data
2013-Predictive functional profiling of microbial communities using 16S rRNA marker gene sequences
2:PICRUSt软件的使用说明如下:
2-1:输入来自qiime软件的biom格式的OTU_table,其使用的流程如下:
https://picrust.github.io/picrust/tutorials/otu_picking.html#otu-picking-tutorial
https://picrust.github.io/picrust/tutorials/metagenome_prediction.html#metagenome-prediction-tutorial
但是需要注意的是你用于分类的参考数据库必须是greengene,但是目前greengene数据库更新速度没有SILVA快,因此如果选择SILVA作为pick_otu时候的数据库,则无法使用PICRUSt软件来预测功能。
3:下面是一个可以处理SILVA的脚本,需要输入两个重要文件:
a:使用qiime输出的OTU表格,注意在用qiime的标准流程跑完以后生成的biom文件,需要进一步对OTU进行过滤,过滤的标准可以参考文献(Quality-filtering
vastly improves diversity estimates from Illumina amplicon
sequencing),对OTU过滤的标准就是一个OTU出现的次数最少为,总的测序数据的:c
b:OTU 代表序列
脚本如下:
#!/usr/bin/perl use strict; use warnings; use Cwd; use Getopt::Long; my $outdir ||=getcwd; my ( $table , $blast , $fasta ); my $greengene ||= "/share/nas21/fanyc/gg_data/gg" ; #this my $blastcmd = "/share/nas2/genome/bin/blastn" ; my $picust = "/share/nas2/genome/biosoft/picrust/1.0.0/build/scripts-2.7" ; my $biom = "/share/nas2/genome/biosoft/Python/2.7.8/bin/biom" ; my $identify ||=97; GetOptions( "o:s" =>\ $outdir , "otu:s" =>\ $table , "rep:s" =>\ $fasta , "blast:s" =>\ $blastcmd , "gg:s" =>\ $greengene , "p:s" =>\ $identify ); sub usage{ print qq{ This use picust The The perl $0 -otu $outdir options: -otu -rep -gg -blast -identify -o Email:fanyucai1\ @126 .com 2015.12.4 }; exit ; } if (! $table || $fasta ) { &usage(); } open (IN1, $fasta ); my %hash ; my $seqname ; while () { chomp ; if ( $_ =~/\>/) { my @array = split ; $seqname = substr ( $array [0],1); } else { $hash { $seqname }.= $_ ; } } close IN1; open (IN2, $table ) open (OUT2, ">$outdir/new_rep.fasta" ); while () chomp ; if ( $_ !~/ #/) { chomp ; my @array = split ( "\t" , $_ ); if ( $hash { $array [0]}) { print OUT2 ">" , $array [0], "\n" , $hash { $array [0]}, "\n" ; } } } close IN2; system "$blastcmd ; open (IN3, "$outdir/blast.out" ); my %map ; my %map2 ; while () { chomp ; my @array = split ; #array[0] #array[1] #array[2] if ( $array [2]>= $identify ) { if ( exists $map2 { $array [1]}) { if ( $array [2]> $map2 { $array [1]}) { $map { $array [1]}= $array [0]; $map2 { $array [1]}= $array [2]; } } else { $map { $array [1]}= $array [0]; $map2 { $array [1]}= $array [2]; } } } close IN3; open (IN4, $table ); open (OUT, ">$outdir/new_otu_table_no_singletons.txt" ); open ( MAP , ">$outdir/OTU2greengee.txt" ); TT: while () { chomp ; if ( $_ !~/ #/) { my @array = split ( "\t" , $_ ); foreach my $key ( keys %map ) { if ( $map { $key } $array [0]) { print OUT $key , "\t" ; print MAP $key , "\t" , $map { $key }, "\n" ; for ( my $k =1; $k <=$ #array-1;$k++) { print OUT $array [ $k ], "\t" ; } print OUT $array [$ #array],"\n"; next TT; } } } else { print OUT $_ , "\n" ; } } system "$biom ; system "$picust/normalize_by_copy_number.py ; system "$picust/predict_metagenomes.py ; system "$picust/predict_metagenomes.py ; system "$picust/predict_metagenomes.py ; system "$picust/predict_metagenomes.py ; |