| 1. 收集那些生物学功能已经清楚了的序列,并根据不同的功能将这些序列分为一个个家族;
DR0651
GTFPLTLGGDHSVSMGTVTGNGLRGRPQRTGVIWVDAHTDYNTPESSPSGNI
BH1985
WQPIVLGGDHSISFPSIKAFASAKGTIGVIQFDAHHDLRNLEDGGPC
PA3175
HRVVGLGGGHEIAYASFAGLARHLSRHERLPRIGILNFDAHFDLRHAERAS
VC1204
VLGGGHEIAWATFQGLAQHFLATGVKQPRIGIINFDAHFDLRTFESELAPVRPS
BS_ywhG
KFPMGMGGEHLVSWPVIKAMYKKYPDLAIIHFDAHTDLRVDYEGEPLS
NMB0469
KRCLSLGGDHFITLPLLRAHARYFGKLALIHFDAHTDTYDNGSEYD
PA0288
TLPLSVGGDHLVTLPIFRALGRERPLGMVHFDAHSDTNDRYFGDNPYT
MJ0309
KKIIVFGGEHSITYPIIKAVKDIYDDFIVIQFDAHCDLRDEYLGNKLS
APE1637
KVPVVLGGEHLVTLGALRGLAAAGVKPCVVVLDAHFDLRNDYLGERFS
Ta1058
KIPIMLGGEHSITVGAVRNFPEDVHMVIVDAHSDFRDSYMGNKLN
MTH868
LKPLIIGGEHTVTLPVIENLPEHDSLTVVHLDAHMDLADTYAGERYS
APE0316
RLFIFLGGDHSITYATLRALRSFYRGRLGLVYLDAHPDLYDEYEGDRYS
YPL111w
NRFPLTLGGDHSIAIGTVSAVLDKYPDAGLLWIDAHADINTIESTPSGNL
sll1077
AFPIILGGDHSIGFPTVRGICRHLGDKKVGIIHFDRHVDTQETDLDERM
VNG1767G
ATPLLVGGEHTVTVAAVRALNPDVFVALDAHLDLRAALDGDPLS
sll0228
KFVVAIGGEHAITTGVVRAMQRGTSEPFTVVQIDAHGDMRDKFEGSCHN
DRA0149
VPVFLGGDHSVSYPLLRAFADVPDLHVVQLDAHLDFTDTRNDTKWS
2. 对属于同一家族的序列进行多序列比对(multiple sequence alignment)
DR0651
GTFPLTLGGDHSVSMGTVTGNGLRGRP-----QRTGVIWVDAHTDYNT-PESSPS--GNI
BH1985
-WQPIVLGGDHSISFPSIKAFASA-------KGTIGVIQFDAHHDLRN-LEDG----GPC
PA3175
-HRVVGLGGGHEIAYASFAGLARH-LSRHERLPRIGILNFDAHFDLRH-AE------RAS
VC1204
-----VLGGGHEIAWATFQGLAQHFLATGVKQPRIGIINFDAHFDLRT-FESELAPVRPS
BS_ywhG
-KFPMGMGGEHLVSWPVIKAMYKKY-------PDLAIIHFDAHTDLRV-DYEGEP---LS
NMB0469
-KRCLSLGGDHFITLPLLRAHARYF-------GKLALIHFDAHTDT---YDNGSE---YD
PA0288
-TLPLSVGGDHLVTLPIFRALGRER--------PLGMVHFDAHSDTND-RYFGDNP--YT
MJ0309
-KKIIVFGGEHSITYPIIKAVKDIY-------DDFIVIQFDAHCDLRD-EYLGNK---LS
APE1637
-KVPVVLGGEHLVTLGALRGLAAAG-------VKPCVVVLDAHFDLRN-DYLGER---FS
Ta1058
-KIPIMLGGEHSITVGAVRNFPED----------VHMVIVDAHSDFRD-SYMGNK---LN
MTH868
-LKPLIIGGEHTVTLPVIENLPEHD--------SLTVVHLDAHMDLAD-TYAGER---YS
APE0316
-RLFIFLGGDHSITYATLRALRSFYR------GRLGLVYLDAHPDLYD-EYEGDR---YS
YPL111w
NRFPLTLGGDHSIAIGTVSAVLDKY-------PDAGLLWIDAHADINT-IESTPS--GNL
sll1077
-AFPIILGGDHSIGFPTVRGICRHLGD-----KKVGIIHFDRHVDTQETDLDER-----M
VNG1767G
-ATPLLVGGEHTVTVAAVRALNPD-----------VFVALDAHLDLRA-ALDGDP---LS
sll0228
-KFVVAIGGEHAITTGVVRAMQRGTS------EPFTVVQIDAHGDMRD-KFEGSC---HN
DRA0149
--VPVFLGGDHSVSYPLLRAFADVP--------DLHVVQLDAHLDFTD-TRNDTK---WS
3. 可能的话,参考空间结构信息,对alignment的结果作进一步的修正;
4.
定出保守位点(用残基的大写单字母表示)、可变位点(用小写x表示)、gap位点(用小写g表示),从而得到描述该家族的一致性序列(consensus
sequence),
如:
xxxVKxxxgxxxDxxx……;
DR0651
GTFPLTLGGDHSVSMGTVTGNGLRGRP-----QRTGVIWVDAHTDYNT-PESSPS--GNI
BH1985
-WQPIVLGGDHSISFPSIKAFASA-------KGTIGVIQFDAHHDLRN-LEDG----GPC
PA3175
-HRVVGLGGGHEIAYASFAGLARH-LSRHERLPRIGILNFDAHFDLRH-AE------RAS
VC1204
-----VLGGGHEIAWATFQGLAQHFLATGVKQPRIGIINFDAHFDLRT-FESELAPVRPS
BS_ywhG
-KFPMGMGGEHLVSWPVIKAMYKKY-------PDLAIIHFDAHTDLRV-DYEGEP---LS
NMB0469
-KRCLSLGGDHFITLPLLRAHARYF-------GKLALIHFDAHTDT---YDNGSE---YD
PA0288
-TLPLSVGGDHLVTLPIFRALGRER--------PLGMVHFDAHSDTND-RYFGDNP--YT
MJ0309
-KKIIVFGGEHSITYPIIKAVKDIY-------DDFIVIQFDAHCDLRD-EYLGNK---LS
APE1637
-KVPVVLGGEHLVTLGALRGLAAAG-------VKPCVVVLDAHFDLRN-DYLGER---FS
Ta1058
-KIPIMLGGEHSITVGAVRNFPED----------VHMVIVDAHSDFRD-SYMGNK---LN
MTH868
-LKPLIIGGEHTVTLPVIENLPEHD--------SLTVVHLDAHMDLAD-TYAGER---YS
APE0316
-RLFIFLGGDHSITYATLRALRSFYR------GRLGLVYLDAHPDLYD-EYEGDR---YS
YPL111w
NRFPLTLGGDHSIAIGTVSAVLDKY-------PDAGLLWIDAHADINT-IESTPS--GNL
sll1077
-AFPIILGGDHSIGFPTVRGICRHLGD-----KKVGIIHFDRHVDTQETDLDER-----M
VNG1767G
-ATPLLVGGEHTVTVAAVRALNPD-----------VFVALDAHLDLRA-ALDGDP---LS
sll0228
-KFVVAIGGEHAITTGVVRAMQRGTS------EPFTVVQIDAHGDMRD-KFEGSC---HN
DRA0149
--VPVFLGGDHSVSYPLLRAFADVP--------DLHVVQLDAHLDFTD-TRNDTK---WS
Consensus
gxxxxxxGGDHxxxxxxxxxxxxxxxggggggxxxxxxxxDAHxDxxxxxxxxxxgggxx
5.
对每一个家族都按这种方法,定出相应的一致性序列,构成一致性序列(consensus sequence)数据库。
转载请注明来源生物信息学论坛http://www.bioxxx.cn
|