计算机工程与应用 ›› 2015, Vol. 51 ›› Issue (22): 124-129.

• 数据库、数据挖掘、机器学习 • 上一篇    下一篇

基于Map/Reduce的外壳片段立方体并行计算方法

唐珊珊,朱跃龙,朱  凯   

  1. 河海大学 计算机与信息学院,南京 210098
  • 出版日期:2015-11-15 发布日期:2015-11-16

Parallel computation of shell fragments cube Map/Reduce-based

TANG Shanshan, ZHU Yuelong, ZHU Kai   

  1. College of Computer and Information, Hohai University, Nanjing 210098, China
  • Online:2015-11-15 Published:2015-11-16

摘要: 针对高维、维度分层的大数据集,提出一种基于Map/Reduce框架的并行外壳片段立方体构建算法。算法采用Map/Reduce框架,实现外壳片段立方体的并行构建与查询。构建算法在Map过程中,计算出各个数据分块所有可能的数据单元或层次维编码前缀;在Reduce过程中,聚合计算得到最终的外壳片段和度量索引表。实验证明,并行外壳片段立方体算法一方面结合了Map/Reduce框架的并行性和高扩展性,另一方面结合了外壳片段立方体的压缩策略和倒排索引机制,能够有效避免高维数据物化时数据量的爆炸式增长,提供快速构建和查询操作。

关键词: 联机分析处理, 外壳片段立方体, Map/Reduce技术, 并行计算

Abstract: In the high-dimensional and dimension hierarchical big data materializing, this paper proposes an efficient parallel shell fragments cube construction algorithm using Map/Reduce framework. The algorithm achieves parallel building and querying of shell fragments cube. For each data partition, map process of the construction algorithm calculates all possible data unit or prefixB encoding; Reduce process aggregates to calculate the ultimate shell fragments and measure index table. Experiments show that the parallel shell fragments cube algorithm not only combines the parallelism and scalability of Map/Reduce framework, but also combines the compression strategy and inverted index structure of shell fragments cube. The parallel shell fragments cube algorithm can effectively avoid the explosion of data volumes while materializing high-dimensional data, and provides the quick build and query operations.

Key words: On-Line Analysis Processing(OLAP), shell fragments cube, Map/Reduce, parallel computation