计算机工程与应用 ›› 2013, Vol. 49 ›› Issue (9): 150-155.

• 数据库、数据挖掘、机器学习 • 上一篇    下一篇

粒子群算法在分布式ETL任务调度中的应用

王春阳,赵书良,王长宾   

  1. 河北师范大学 数学与信息科学学院 河北省计算数学与应用重点实验室,石家庄 050024
  • 出版日期:2013-05-01 发布日期:2016-03-28

Application for particle swarm algorithm in distributed ETL job scheduling

WANG Chunyang, ZHAO Shuliang, WANG Changbin   

  1. Hebei Province Key Laboratory of Computational Mathematics and Applications, College of Mathematics and Information Science, Hebei Normal University, Shijiazhuang 050024, China
  • Online:2013-05-01 Published:2016-03-28

摘要: 随着分布式数据环境越来越复杂,ETL工具要面临数据源多、分布地域广和海量数据等因素带来的挑战。原有的集中式ETL工作流优化理论不能满足现在复杂数据环境的要求。介绍了如何将基于置换的离散型粒子群算法应用到分布式ETL任务优化调度问题上,主要工作围绕ETL工作调度模型、算法编码设计、目标函数选择等内容来展开,给出了分布式ETL工作调度策略的实现过程和伪代码。理论分析和实验证明了实际应用的有效可行性。

关键词: 分布式抽取-转换-加载(ETL), 任务调度, 基于置换的离散型粒子群算法

Abstract: With the increasing complexity of distributed data environment, ETL tools face the challenge of many data sources, geographic distribution, massive data and other factors. The original centralized ETL workflow optimization theory can not meet the demands of the environment of the complex data. This paper presents how the discrete particle swarm optimization based on replacement is used in task scheduling of the distributed ETL. The main contents include the abstraction of the ETL task scheduling model, design of the algorithm coding, selection of objective function and so on. The realization and pseudocode of distributed ETL job scheduling strategy are also mentioned. The theory and experiment prove it to be feasible and efficient.

Key words: distributed Extration-Transformation-Loading(ETL), task scheduling, discrete particle swarm optimization based on replacement algorithm