计算机工程与应用 ›› 2012, Vol. 48 ›› Issue (6): 126-128.

• 数据库、信号与信息处理 • 上一篇    下一篇

一种新的蛋白质亚细胞定位预测方法

程昔恩,吴志诚   

  1. 景德镇陶瓷学院 信息工程学院,江西 景德镇 333403
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2012-02-21 发布日期:2012-02-21

Novel approach to prediction of protein subcellular localization

CHENG Xien, WU Zhicheng   

  1. Information Engineering School, Jingdezhen Ceramic Institute, Jingdezhen, Jiangxi 333403, China
  • Received:1900-01-01 Revised:1900-01-01 Online:2012-02-21 Published:2012-02-21

摘要: 蛋白质亚细胞定位是蛋白质组学基本问题之一。某些类型蛋白质可能存在于两个或两个以上的亚细胞位置,这类蛋白质的亚细胞定位问题更为复杂。分别利用Gene Ontology和伪氨基酸成分法,将一条蛋白质表示为一实值向量;采纳多标记学习中的Ranking思想,计算出一得分向量V,该向量的每一分量的值表示被预测蛋白质属于某个亚细胞位置的概率;利用最近邻算法预测蛋白质所属亚细胞位置的个数n,得分向量V中得分最高的n个分量对应的亚细胞位置即为预测的位置。

关键词: 蛋白质亚细胞定位, 多标记学习, Gene Ontology, 最近邻算法

Abstract: It is one of basic problems of proteomics to identify the subcellular locations of a protein. It makes the problem more complicated that some proteins may simultaneously exist in two or more than two subcellular locations. Gene Ontology and pseudo amino acid composition are respectively employed to represent a protein as a real values vector. The idea of Ranking initiating from multi-label learning community is adopted to compute a score vector V, each component value of which indicates the probability that a protein of the corresponding subcellular location.The nearest neighbor algorithm is then employed to predict the number n of subcellular localization of human proteins. Finally, the n subcellular locations corresponding to the top n scores components in V are assign to the query protein.

Key words: protein subcellular localization, multi-label learning, Gene Ontology, k-nearest neighbors algorithm