计算机工程与应用 ›› 2011, Vol. 47 ›› Issue (18): 163-165.

• 数据库、信号与信息处理 • 上一篇    下一篇

蛋白质二级结构预测的一种新的编码方式

李汪根,叶小娇,黄尧颖   

  1. 安徽师范大学 数学计算机学院,安徽 芜湖 241003
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2011-06-21 发布日期:2011-06-21

Novel coding scheme of protein secondary structure prediction

LI Wanggen,YE Xiaojiao,HUANG Yaoying   

  1. College of Mathematics and Computer Science,Anhui Normal University,Wuhu,Anhui 241003,China
  • Received:1900-01-01 Revised:1900-01-01 Online:2011-06-21 Published:2011-06-21

摘要: 编码方式是影响蛋白质二级结构预测准确率的重要因素之一。针对单序列蛋白质二级结构预测问题,提出了一种新的综合编码方法。该编码是根据氨基酸出现在每种二级结构中的倾向因子以及氨基酸的疏水性值进行分类,并以二进制形式来表示每类氨基酸的编码方法。在相同的实验条件下,首先用不同的编码方式对数据集CB513进行编码,然后采用支持向量机的方法进行训练建模预测。实验结果显示提出编码的预测准确率比20位正交编码和5位编码分别高出1.48%和10.68%。可见,该编码比较适合非同源或低同源蛋白质结构预测。

关键词: 编码方式, 蛋白质二级结构预测, 支持向量机

Abstract: Coding scheme plays an important role on determining the protein secondary structure prediction.A new comprehensive coding scheme is suggested to use for single-sequence protein secondary structure prediction.The method regards not only the trending factor of every amino acid appearance in protein secondary structure,but also the value of amino acid hydrophobicity,and it uses binary form to express all kinds of amino acid.The different code schemes are used to state the date set of CB513.Then,the theory of Support Vector Machine(SVM) is applied to protein secondary structure prediction.The results show that the prediction accuracy of the new coding scheme are about 1.48% and 10.68% higher than the classical orthogonal matrix and the five coding,respectively.It showes that this coding is more suitable for non-homologous or lower homologous protein structure prediction.

Key words: coding scheme, protein secondary structure prediction, Support Vector Machine(SVM)