计算机工程与应用 ›› 2007, Vol. 43 ›› Issue (7): 38-40.

• 学术探讨 • 上一篇    下一篇

蛋白质序列的分组重量编码及在结构型预测的应用

张振慧 王正华 王勇献   

  1. 国防科学技术大学理学院 湖南农业大学工程技术学院
  • 收稿日期:2006-07-03 修回日期:1900-01-01 出版日期:2007-03-01 发布日期:2007-03-01
  • 通讯作者: 张振慧

Encoding Based on Grouped Weight for Protein Sequence and Its Application to Structural Class Prediction

ZhenHui Zhang   

  • Received:2006-07-03 Revised:1900-01-01 Online:2007-03-01 Published:2007-03-01
  • Contact: ZhenHui Zhang

摘要: 从氨基酸的物化特性出发,利用物理学中“粗粒化”思想,提出了一种蛋白质序列的分组重量编码方法(Encoding Based on Grouped Weight,简记为EBGW),并结合组分耦联算法进行结构型预测的研究。对标准集T359中359个蛋白质的Resubstitution检验和Jack-knife检验预测准确性分别达到99.72%和91.09%,其中Jack-knife检验总体预测精度比相同条件下采用氨基酸组成编码的方法提高了约7%,特别是α+β类的预测精度提高了15%。实验结果表明蛋白质序列的EBGW编码方法能够有效的提取字母序列中蕴含的结构信息。

关键词: 蛋白质序列, 特征序列, 组分耦联算法, 结构型

Abstract: Based on the idea of coarse-grained description in physics, a new encoding method with grouped weight for protein sequence is presented, and applied to protein structural class prediction associated with component-coupled algorithm. The average rate of correct recognition is 99.72% in Resubstitution test and 91.09% in Jack-knife test for standard set of 359 proteins. For the same training dataset and the same predictive algorithm, the overall predictive accuracy of our method for the Jack-knife test is 7% higher than the accuracy based only on the amino-acid composition, especially for the class of α+β is 15% higher than that for amino-acid composition method. The experiment results show that the encoding method is efficient to extract the structure information implicated in protein sequence.

Key words: amino acid sequence, characteristic sequence, component-coupled algorithm, structural class