计算机工程与应用 ›› 2018, Vol. 54 ›› Issue (24): 234-240.DOI: 10.3778/j.issn.1002-8331.1708-0299

• 工程与应用 • 上一篇    下一篇

基于Spark的L1-BC算法在关键蛋白质检测中的应用

胡德祺,孙永奇,秦  朝   

  1. 北京交通大学 计算机与信息技术学院,北京 100044
  • 出版日期:2018-12-15 发布日期:2018-12-14

Application of L1-BC algorithm based on Spark in key proteins detection

HU Deqi, SUN Yongqi, QIN Chao   

  1. School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China
  • Online:2018-12-15 Published:2018-12-14

摘要: Spark作为当今大数据领域的分布式处理框架,在各个领域的应用越来越广泛。在关键蛋白质预测中,基于蛋白质相互作用网络拓扑结构的介数中心(BC)指标有着很好的预测效果,提出一种新的L1-BC指标,不仅能区分一些BC指标值相同的蛋白质,还能通过取子图计算体现出蛋白质的局部特性,实验结果表明该指标能够提高关键蛋白质的预测精度。基于Spark平台实现了L1-BC指标的并行计算算法,通过累加器和广播变量使得内存得到极大的优化,在数据集YDIP上的实验结果表明,基于Spark的L1-BC算法的加速比达到了94.31%。

关键词: Spark, 分布式计算, 关键蛋白质检测, 介数中心性

Abstract: Spark is widely used in various fields as a distributed processing framework in big data field. For the key protein prediction in Protein-Protein Interaction(PPI) networks, the Betweenness Centrality(BC) which is based on the topology properties of PPI networks has a good prediction effect. In this paper, a new index called L1-BC is presented, which not only distinguishes some proteins with the same values of BC index, but also reflects the local properties of proteins. The experimental results show that the index L1-BC can improve the accuracy of key proteins prediction. In addition, the parallel computing algorithm of L1-BC is implemented based on Spark platform, in which the memory is greatly optimized by utilizing accumulator and broadcast variable. The results of acceleration ratio experiment on the dataset YDIP show that the optimized L1-BC algorithm on the Spark can reach the acceleration ratio of 94.31%.

Key words: Spark, distributed computing, key proteins prediction, betweenness centrality