计算机工程与应用 ›› 2016, Vol. 52 ›› Issue (15): 83-87.

• 大数据与云计算 • 上一篇    下一篇

基于ACC变换和RFE算法的蛋白质亚核定位预测

李小苇1,刘太岗2,陶珮莹1,王春华2   

  1. 1.上海海洋大学 食品学院,上海 201306
    2.上海海洋大学 信息学院,上海 201306
  • 出版日期:2016-08-01 发布日期:2016-08-12

Predicting protein subnuclear location using ACC transformation and RFE algorithm

LI Xiaowei1, LIU Taigang2, TAO Peiying1, WANG Chunhua2   

  1. 1.College of Food Science & Technology, Shanghai Ocean University, Shanghai 201306, China
    2.College of Information Technology, Shanghai Ocean University, Shanghai 201306, China
  • Online:2016-08-01 Published:2016-08-12

摘要: 获取真核细胞中细胞核内蛋白质定位的信息对注解蛋白质功能具有非常重要的意义。针对于利用计算方法预测蛋白质在亚核水平上的定位更具挑战性的问题,提出了基于自互协方差变换与递归特征消除预测蛋白质亚核定位的方法。该方法基于位置特异性得分矩阵利用自互协方差变换构建蛋白质序列的特征向量,采用递归特征消除法进行特征选择,选用支持向量机作为预测工具,并在两个经典数据集SC714和LD504上进行了夹克刀测试。实验结果表明,该方法比大多数已报道的预测方法具有更高的预测准确率。

关键词: 蛋白质亚核定位, 位置特异性得分矩阵, 自互协方差变换, 递归特征消除

Abstract: The knowledge of protein subnuclear location in eukaryotic cells plays a very important role for understanding the biological functions of proteins. As it is very difficult and challenging to predict it at the subnuclear level using computational methods, a method which combines Auto Cross Covariance(ACC) transformation and Recursive Feature Elimination(RFE) has been proposed. ACC transformation is first employed to extract features to represent the proteins based on Position Specific Scoring Matrix(PSSM). Then, RFE is adopted to select the optimal features. Finally, the reduced features are input to a Support Vector Machine(SVM) to perform the prediction. Jackknife tests on two widely used datasets(SC714 and LD504) show that the proposed method is very promising and performs better than most of existing methods.

Key words: protein subnuclear location, position specific scoring matrix, auto cross covariance transformation, recursive feature elimination