计算机工程与应用 ›› 2020, Vol. 56 ›› Issue (2): 42-54.DOI: 10.3778/j.issn.1002-8331.1901-0284

• 理论与研发 • 上一篇    下一篇

基于模块度函数的加权蛋白质复合物识别算法

毛伊敏,刘银萍   

  1. 江西理工大学 信息工程学院,江西 赣州 341000
  • 出版日期:2020-01-15 发布日期:2020-01-14

Algorithm for Identifying Weighted Protein Complexes Based on Modularity Function

MAO Yimin, LIU Yinping   

  1. School of Information Engineering, Jiangxi University of Science and Technology, Ganzhou, Jiangxi 341000, China
  • Online:2020-01-15 Published:2020-01-14

摘要: 针对加权模块度函数聚类算法在蛋白质相互作用网络中进行复合物识别的准确率不高、召回率较低以及时间性能不佳等问题进行了研究,提出一种基于模块度函数的加权蛋白质复合物识别算法IWPC-MF(Algorithm for Identifying Weighted Protein Complexes based on Modularity Function)。融合点聚集系数改进边聚集系数,将改进后的边点聚集系数与基因共表达的皮尔逊相关系数结合来构建加权蛋白质网络;基于节点权重选取种子节点,遍历种子的邻居节点,设计节点间的相似度度量和蛋白质附着度来获取初始聚类模块;设计基于紧密度的蛋白质复合物模块度函数来合并初始模块,并最终完成复合物的识别,克服传统的模块度函数无法识别出重叠和规模较小的复合物的缺陷。将IWPC-MF算法应用在DIP数据上进行复合物的识别,实验结果表明IWPC-MF算法的准确率和召回率较高,能够较准确地识别蛋白质复合物。

关键词: 蛋白质相互作用网络, 模块度函数, 初始模块, 蛋白质复合物

Abstract: Aiming at the problem that the accuracy and recall of the protein complexes identification algorithm based on weighted modularity function clustering are not high and the running efficiency is low, a protein complex recognition algorithm named IWPC-MF(Algorithm for Identifying Weighted Protein Complexes based on Modularity Function) is proposed. Firstly, the edge clustering coefficient is improved by using the point clustering coefficient, and the weighted protein network is constructed by combining the Pearson correlation coefficient and the edge point clustering coefficient. Secondly, seed nodes are selected according to the weight of nodes, then the similarity measurement and the protein attachment degree between nodes are designed to obtain the initial clustering module by traversing neighbors of seeds. Finally, based on the tightness, the modularity function is designed to merge the initial module and finally complete the protein complex detection, which can overcome the defect that the traditional modularity function cannot identify the overlapped and small complex. IWPC-MF algorithm is used to identify protein complexes on DIP data. The experimental results show that IWPC-MF algorithm has better performance on accuracy and recall, which is more reasonable to identify protein complexes.

Key words: protein-protein interaction network, modularity function, initial module, protein complex