计算机工程与应用 ›› 2018, Vol. 54 ›› Issue (13): 84-87.DOI: 10.3778/j.issn.1002-8331.1706-0116

• 大数据与云计算 • 上一篇    下一篇

基于分布式文件系统的MPP数据库扫描调度研究

郭  凯1,龚才鑫1,龚奕利1,雷迎春2   

  1. 1.武汉大学 计算机学院,武汉 434000
    2.北京达沃时代科技有限公司,北京 100000
  • 出版日期:2018-07-01 发布日期:2018-07-17

Research on scan scheduling in MPP databases on distributed file systems

GUO Kai1, GONG Caixin1, GONG Yili1, LEI Yingchun2   

  1. 1.Computer School, Wuhan University, Wuhan 434000, China
    2.Beijing Daowoo Technology Co., Ltd., Beijing 100000, China
  • Online:2018-07-01 Published:2018-07-17

摘要: 基于分布式文件系统的MPP(大规模并行处理)数据库是目前的研究热点,为改善其执行查询扫描操作前调度执行单元读取数据块的过程,提出一种基于节点负载的调度策略NLS。这种策略同时结合数据本地性和节点负载,通过本地读分配保证调度结果满足良好的数据本地性,基于节点的实时工作负载对中间调度结果进行重分配调整,以达到减少数据扫描操作完成时间的目标。实验结果表明,相比连续性调度策略FCS,NLS在保持90%以上数据本地性的同时,在完成时间上的优化最多达到32%,在测试的9种情况中平均优化25%。

关键词: 分布式文件系统, 数据库, 查询调度, 负载优化

Abstract: MPP (Massive Parallel Processing) database over distributed file systems has become one of research hotspots currently. In order to improve the procedure that schedule execution units to read data blocks before executing query scan operations, a scheduling strategy NLS based on nodes workload is proposed, which combines data locality and nodes workload. On the one hand, the phase of data locality allocating ensures that the scheduling results meet good data locality. On the other hand, reallocating on middle scheduling results based on nodes workload attains the goal of reducing the makespan of scanning data. The experimental results show that compared with the continuity strategy FCS, NLS keeps data locality over 90%. Moreover, the improvement on makespan achieves 32% at most and the average improvement is 25% in all nine test cases.

Key words: distributed file system, database, query scheduling, workload optimization