计算机工程与应用 ›› 2016, Vol. 52 ›› Issue (13): 55-59.

• 理论与研发 • 上一篇    下一篇

序列蛋白质-GDP绑定位点预测

石大宏,何  雪   

  1. 南京理工大学 计算机科学与工程学院,南京 210094
  • 出版日期:2016-07-01 发布日期:2016-07-15

Sequential protein-GDP binding residues prediction

SHI Dahong, HE Xue   

  1. School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China
  • Online:2016-07-01 Published:2016-07-15

摘要: 正确地识别蛋白质-二磷酸鸟苷(Guanosine Diphosphate,GDP)绑定位点对于蛋白质功能分析和药物设计有非常重要的意义。蛋白质-GDP绑定位点预测是一个典型的不平衡学习问题。直接应用传统的机器学习方法是不合适的,而且会使预测结果偏向大多数类。为了解决这个问题,在基于稀疏表示的位置特异性得分矩阵特征基础上,提出了加权下采样方法来使得样本平衡,采用支持向量机算法来预测。实验结果表明提出的方法能获得更高的预测性能。

关键词: 蛋白质-GDP绑定预测, 位置特异性得分矩阵, 稀疏表示, 加权下采样, 支持向量机

Abstract: Accurately identifying the protein-GDP binding sites is of significant importance for both protein function analysis and drug design. Protein-GDP binding residues prediction is a typical imbalanced learning problem. Directly applying the traditional machine learning approach for this task is not suitable as the learning results will be severely biased towards the majority class. To circumvent this problem, on the basis of position specific scoring matrix feature based on sparse representation, weighted under-sampling is developed to make samples balanced. Finally support vector machine is used for prediction. Experimental results show that the proposed method achieves higher prediction performances.

Key words: protein-GDP binding prediction, position specific scoring matrix, sparse representation, weighted under-sampling, support vector machine