计算机工程与应用 ›› 2017, Vol. 53 ›› Issue (10): 155-159.DOI: 10.3778/j.issn.1002-8331.1512-0276

• 模式识别与人工智能 • 上一篇    下一篇

基于SVM-RFE算法的凋亡蛋白亚细胞定位预测

刘太岗,王春华   

  1. 上海海洋大学 信息学院,上海 201306
  • 出版日期:2017-05-15 发布日期:2017-05-31

Predicting apoptosis protein subcellular location based on SVM-RFE algorithm

LIU Taigang, WANG Chunhua   

  1. College of Information Technology, Shanghai Ocean University, Shanghai 201306, China
  • Online:2017-05-15 Published:2017-05-31

摘要: 获取凋亡蛋白亚细胞定位的信息对揭示细胞程序性死亡的机制和注解蛋白质功能都具有非常重要的意义。鉴于实验方法确定亚细胞定位不仅费时费力而且代价过高,开发快速有效的计算方法预测亚细胞定位已成为生物信息学领域的重要研究内容之一。首先基于位置特异性得分矩阵提取氨基酸组分、二肽组分和自协方差变量等特征构建蛋白质序列的特征表示模型,然后采用递归特征消除法进行特征选择,最后选用支持向量机分类器在两个常用数据集上进行夹克刀检验。实验结果表明,该方法优于大多数已报道的预测方法,从而证明了其有效性。

关键词: 位置特异性得分矩阵, 自协方差变换, 支持向量机, 递归特征消除, 夹克刀检验

Abstract: Obtaining information on subcellular location of apoptosis proteins plays an important role for revealing the apoptosis mechanism and understanding the biological function of apoptosis proteins. It is usually time-consuming and costly to determine the subcellular location only relying on wet-bench experiments. Hence, it has become one of the most important research fields in bioinformatics to develop fast and effective computational methods to predict apoptosis proteins subcellular location. In this study, amino acid composition, dipeptide composition and auto covariance variables are extracted to represent a protein based on position specific scoring matrix. Then, recursive feature elimination(RFE) is adopted to select the optimal features. Finally, the reduced features are input to a Support Vector Machine(SVM) to perform the prediction. Jackknife tests on two widely used datasets show that the proposed method provides the state-of-the-art performance in comparison with other existing methods.

Key words: position specific scoring matrix, auto covariance transformation, support vector machine, recursive feature elimination, jackknife test