计算机工程与应用 ›› 2019, Vol. 55 ›› Issue (23): 150-162.DOI: 10.3778/j.issn.1002-8331.1809-0147

• 模式识别与人工智能 • 上一篇    下一篇

基于时序加权PPI网络的关键蛋白质识别

胡健,朱海湾,毛伊敏   

  1. 1.江西理工大学 应用科学学院,江西 赣州 341000
    2.江西理工大学 信息工程学院,江西 赣州 341000
  • 出版日期:2019-12-01 发布日期:2019-12-11

Identifying Essential Proteins Based on Temporal Weighted PPI Networks with Dynamic and Conserved Proteins

HU Jian, ZHU Haiwan, MAO Yimin   

  1. 1.College of Applied Science, Jiangxi University of Science and Technology, Ganzhou, Jiangxi 341000, China
    2.School of Information Engineering, Jiangxi University of Science and Technology, Ganzhou, Jiangxi 341000, China
  • Online:2019-12-01 Published:2019-12-11

摘要: 关键蛋白质是生物体内一切生命活动中不可缺少的物质基础,关键蛋白质的识别不仅可以从理论上理解生命活动机理,同时在实际应用中为药物研制、疾病治疗提供重要基础。目前,现有的关键蛋白质识别算法大多应用在静态PPI网络上,忽略了蛋白质的动态性和保守性,只考虑网络拓扑结构,忽略了蛋白质的生物特性,并且未能完全解决PPI网络中假阳性和假阴性问题。针对以上问题,构建一种混合动态保守蛋白质的时序加权PPI网络,并提出一种名为JTBC(Joint Topological properties,Biological properties and Complexes information)的关键蛋白质识别算法。利用基因表达数据提取动态蛋白质和保守蛋白质的活性信息,以动态调整静态PPI网络进而构建时序PPI网络,有效降低了PPI网络中的假阴性;设计一种融合双重拓扑特性的点边凝聚度DEcc(node and edge cohesion coefficient),以衡量蛋白质在PPI网络中的拓扑特性,再结合带有生物特性的蛋白质结构域信息和皮尔逊相关系数为时序PPI网络加权,以准确描述蛋白质之间的相互作用,减少了假阳性的影响;根据关键蛋白质的聚集特性和共表达特性,设计一种共表达复合物中心性方法局部评估蛋白质的重要程度。综上考虑,整合权重信息和蛋白质复合物信息来综合衡量蛋白质的关键性。实验结果表明该算法能够从全局和局部特性较准确地识别关键蛋白质。

关键词: 关键蛋白质, 保守蛋白质, 混合动态保守蛋白质的时序加权网络, 蛋白质结构域, 共表达复合物中心性

Abstract: The essential protein is indispensable materials in various life processes of of living organisms, so the research of essential protein not only helps to understand the mechanism of life activities in theory, as well as lays a solid foundation on drug design and disease analysis in practice. At present, most of existing computational methods for essential proteins identification are based on static PPI network, which neglect the inherent dynamics and conservative of proteins, as well as generally only consider the topology of the network, the biological nature is ignored. In addition, those methods do not solve the problem of PPI data false positive and false negative. In allusion to the problems mentioned above, the temporal weighted network with dynamic and conserved proteins is constructed and a novel method called JTBC based on temporal weighted network is proposed to predict essential proteins. Firstly, according to gene expression data, dynamic information of dynamic proteins and conserved proteins is extracted to dynamically adjust static PPI network to construct temporal PPI network, which effectively reduces false negatives data in PPI networks. Secondly, in order to reduce the negative effect of false positive, it designs node and edge cohesion coefficient(DEcc) which integrates dual topology characteristics to measure the topological importance of proteins in PPI networks, besides combines protein domain information and Pearson correlation coefficient both with biological properties to weight all the interactions in temporal networks, which accurately describe protein interactions. Finally, it comes up with co-expressed complex centrality to evaluate locally the importance score of proteins based on the aggregation and co-expressed properties of essential proteins. Considering both aspects mentioned, this method comprehensively predicts essential proteins by integrating weight and complex information. Experimental results demonstrate that the algorithm is more reasonable to identify essential proteins from a global and local perspective.

Key words: essential proteins, conserved proteins, temporal weighted network with dynamic and conserved proteins, protein domain, co-expressed complex centrality