Computer Engineering and Applications ›› 2020, Vol. 56 ›› Issue (23): 53-60.DOI: 10.3778/j.issn.1002-8331.2001-0238
Previous Articles Next Articles
GAO Weijun, ZHANG Chunxia, YANG Jie, SHI Yang
Online:
Published:
高玮军,张春霞,杨杰,师阳
Abstract:
In the process of scientific workflow execution, a cluster job composed of multiple tasks has a higher risk of failure than a single task. The fault-tolerant clustering algorithm is faced with load imbalance problems during fault recovery. A Balanced Re-clustering(BR) algorithm is proposed for this purpose. This algorithm combines Horizontal Runtime Balancing(HRB) and Selective Re-clustering(SR) to assign the longest running task to the shortest running class, after re-running the failed task. The experimental results show that compared with the two existing task re-clustering methods, the performance gain of the BR algorithm is up to 84% and 18.75%, respectively, which significantly reduces the workflow execution cost and improves the system’s operating efficiency.
Key words: task clustering, scientific workflow, system overhead, fault tolerance algorithm, balance clustering
摘要:
科学工作流执行过程中,多个任务组成的聚类作业相对单任务故障风险更高。容错聚类算法在进行故障恢复的同时面临着负载不平衡问题,为此提出了一种平衡重聚类算法(Balanced Re-clustering,BR)。该算法结合水平运行时间平衡聚类算法(Horizontal Runtime Balancing,HRB)对选择重聚类方法(Selective Re-clustering,SR)进行改进,将运行时间最长的任务分配给运行时间最短的类,在故障发生后重新运行失败的任务。实验结果表明,与现有的两种任务重聚类方法相比,BR算法的性能增益最高分别可达84%和18.75%,显著降低了工作流执行成本,提高了系统的运行效率。
关键词: 任务聚类, 科学工作流, 系统开销, 容错算法, 平衡聚类
GAO Weijun, ZHANG Chunxia, YANG Jie, SHI Yang. Research on Fault Tolerant Clustering Algorithm of Scientific Workflow Considering Load Balancing[J]. Computer Engineering and Applications, 2020, 56(23): 53-60.
高玮军,张春霞,杨杰,师阳. 考虑负载平衡的科学工作流容错聚类算法研究[J]. 计算机工程与应用, 2020, 56(23): 53-60.
0 / Recommend
Add to citation manager EndNote|Ris|BibTeX
URL: http://cea.ceaj.org/EN/10.3778/j.issn.1002-8331.2001-0238
http://cea.ceaj.org/EN/Y2020/V56/I23/53