Computer Engineering and Applications ›› 2021, Vol. 57 ›› Issue (15): 109-117.DOI: 10.3778/j.issn.1002-8331.2006-0332

Previous Articles     Next Articles

All-to-All Comparison Computing Data Distribution Strategy Based on Particle Swarm Optimization

LI Leixiao, DENG Dan, LI Jie, WANG Yongsheng   

  1. 1.College of Data Science and Application, Inner Mongolia University of Technology, Hohhot 010080, China
    2.Inner Mongolia Autonomous Region Software Service Engineering Technology Research Center Based on Big Data, Hohhot 010080, China
  • Online:2021-08-01 Published:2021-07-26

基于粒子群优化的全比较计算数据分发策略

李雷孝,邓丹,李杰,王永生   

  1. 1.内蒙古工业大学 数据科学与应用学院,呼和浩特 010080
    2.内蒙古自治区基于大数据的软件服务工程技术研究中心,呼和浩特 010080

Abstract:

Data distribution strategy is the key to improve the overall computing performance of distributed cluster system in all-to-all comparison problems. To address the disadvantages of existing data distribution strategies, such as unbalanced computing load, incomplete localization of data, waste of storage space and slow computing speed, etc., on the premise of satisfying the requirement of data localization, a Data Distribution model Based on Particle Swarm Optimization(DDBPSO) is proposed with load balancing and optimal storage as the optimization objectives. DDBPSO model optimizes the particle evolution rules by task disturbance and task exchange, which makes the algorithm robust against local optima. The results in the numerial experiment show that the DDBPSO model has the advantages of computing load balancing, complete data localization, small storage space occupation and fast computing speed compared with the data distribution strategy in Hadoop.

Key words: all-to-all comparison, data distribution, Data Distribution Based on Particle Swarm Optimization(DDBPSO), particle swarm optimization, Hadoop

摘要:

全比较计算数据分发策略是提高分布式集群系统整体计算性能的关键。针对现有数据分发策略存在的计算负载不均衡、数据不能完全本地化、存储空间浪费和计算速度慢等弊端,在满足数据完全本地化的前提下以负载均衡、最优化存储作为优化目标,结合优化的粒子群算法提出了数据分发模型(Data Distribution Based on Particle Swarm Optimization,DDBPSO)。DDBPSO模型分别以任务扰动、交换任务的方式对粒子进化规则进行了优化,有效避免了算法陷入局部最优。通过计算负载、存储占用和数据本地化等实验,结果表明,与开源框架Hadoop的数据分发策略相比,提出的DDBPSO模型与算法具有计算负载均衡、完全的数据本地化、存储空间占用小、计算速度快等优势。

关键词: 全比较计算, 数据分发, DDBPSO模型, 粒子群优化, Hadoop