加载中…
个人资料
玉琪星兆
玉琪星兆
  • 博客等级:
  • 博客积分:0
  • 博客访问:4,428
  • 关注人气:255
  • 获赠金笔:0支
  • 赠出金笔:0支
  • 荣誉徽章:
正文 字体大小:

matlab-生物信息工具箱bioinformatics tool学习之二

(2009-03-07 00:48:22)
标签:

matlab

生物信息工具箱

教育

分类: matlab及perl学习

系统发生分析 -开始太傻了!!

1、首先建立malab结构,将要分析的各物种的信息输入:

>>data = {'German_Neanderthal'      'AF011222';
        'Russian_Neanderthal'     'AF254446';
        'European_Human'          'X90314'  ;
        'Mountain_Gorilla_Rwanda' 'AF089820';
        'Chimp_Troglodytes'       'AF176766';
       };%建立结构

2、开始准备序列,以备下面分析:

>>for ind = 1:5 %设置循环,因为上面结构一共五个物种
    seqs(ind).Header   = data{ind,1};%一样设置为结构
    seqs(ind).Sequence = getgenbank(data{ind,2},...
                                    'sequenceonly', true);
end

3、这儿就要讲一下,如何进行序列比对了:

D = seqpdist(Seqs)%可以是细胞数组,或者是包含序列结构向量,或者是矩阵,每一行一条序列;

D = seqpdist(Seqs, ...'Method', MethodValue, ...)% p距离或者(Default is Jukes-Cantor)或者序列比对得分。

D = seqpdist(Seqs, ...'Indels', IndelsValue, ...)% 插入缺失突变是否算分
D = seqpdist(Seqs, ...'Optargs', OptargsValue, ...)
D = seqpdist(Seqs, ...'PairwiseAlignment', PairwiseAlignmentValue, ...)%全局序列比对,长度不一则为TRUE,否则为FALSE
D = seqpdist(Seqs, ...'JobManager', JobManagerValue, ...)%需要进行Parallel Computing Toolbox分析
D = seqpdist(Seqs, ...'WaitInQueue', WaitInQueueValue, ...)
D = seqpdist(Seqs, ...'SquareForm', SquareFormValue, ...)%是否把结果输出方阵
D = seqpdist(Seqs, ...'Alphabet', AlphabetValue, ...)%'NT'标示序列为核酸 or 蛋白'AA'
D = seqpdist(Seqs, ...'ScoringMatrix', ScoringMatrixValue, ...)

  算分矩阵

    *

      'PAM40'
    *

      'PAM250'
    *

      'DAYHOFF'
    *

      'GONNET'
    *

      'BLOSUM30' increasing by 5 up to 'BLOSUM90'
    *

      'BLOSUM62'
    *

      'BLOSUM100'

默认Default is:

    *

      'NUC44' (when AlphabetValue equals 'NT')
    *

      'BLOSUM50' (when AlphabetValue equals 'AA')
D = seqpdist(Seqs, ...'Scale', ScaleValue, ...)
D = seqpdist(Seqs, ...'GapOpen', GapOpenValue, ...)%空位罚分,默认为8
D = seqpdist(Seqs, ...'ExtendGap', ExtendGapValue, ...)

3、对序列比对的距离进行构树,

>>tree = seqlinkage(distances,'UPGMA',seqs) 

%Tree = seqlinkage(Dist, Method, Names)

方法如下:

Names来描述分支点。

%'single'  

Nearest distance (single linkage method)
%'complete' 

Furthest distance (complete linkage method)
%'average' (default) 

Unweighted Pair Group Method Average (UPGMA, group average).
%'weighted' 

Weighted Pair Group Method Average (WPGMA)
%'centroid' 

Unweighted Pair Group Method Centroid (UPGMC)
%'median' 

Weighted Pair Group Method Centroid (WPGMC)  

4、画出系统发育树

>>h = plot(tree,'orient','bottom');
>>ylabel('Evolutionary distance')
>>set(h.terminalNodeLabels,'Rotation',-45)

5、下面加上7条序列

>>data2 = {'Puti_Orangutan'          'AF451972';
         'Jari_Orangutan'          'AF451964';
         'Western_Lowland_Gorilla' 'AY079510';
         'Eastern_Lowland_Gorilla' 'AF050738';
         'Chimp_Schweinfurthii'    'AF176722';
         'Chimp_Vellerosus'        'AF315498';
         'Chimp_Verus'             'AF176731';
       };

6、获取序列

>>for ind = 1:7
    seqs(ind+5).Header   = data2{ind,1};
    seqs(ind+5).Sequence = getgenbank(data2{ind,2},...
                                      'sequenceonly', true);
end

7、同上操作

>>distances = seqpdist(seqs,'Method','Jukes-Cantor','Alpha','DNA');
>>tree = seqlinkage(distances,'UPGMA',seqs);

8、同上画图

>>h = plot(tree,'orient','bottom');
>>ylabel('Evolutionary distance')
>>set(h.terminalNodeLabels,'Rotation',-45)

9、罗列系统发育树的名称

>>names = get(tree,'LeafNames')

10、选择输出

>>[h_all,h_leaves] = select(tree,'reference',3,...
                          'criteria','distance',...
                          'threshold',0.6);

11、去除无用的枝杈

>>leaves_to_prune = ~h_leaves;
pruned_tree = prune(tree,leaves_to_prune)
h = plot(pruned_tree,'orient','bottom');
ylabel('Evolutionary distance')
set(h.terminalNodeLabels,'Rotation',-30)

12、显示最终结果

phytreetool(pruned_tree)  

0

阅读 收藏 喜欢 打印举报/Report
  

新浪BLOG意见反馈留言板 欢迎批评指正

新浪简介 | About Sina | 广告服务 | 联系我们 | 招聘信息 | 网站律师 | SINA English | 产品答疑

新浪公司 版权所有