Research on Vector Grouping Aggregation Technology

doi:10.3778/j.issn.1002-8331.2004-0309

Abstract

Abstract:

The grouping & aggregation operation is one of the important OLAP operator, and it is data-intensive workload. In main-memory database and GPU database scenarios, not only the performance optimizations are to be studied but also how to optimally assign the executing platform for grouping & aggregation operation to minimize data transmission overhead between CPU and GPU should be focused. This paper presents vector grouping aggregation method, the traditional grouping & aggregation operation is separated from the pipeline by “early grouping, late aggregating” strategy, so that the data-intensive grouping & aggregation operation is separated from the pipeline. Moreover, the optimal workload distribution is achieved by matching processor hardware characteristics with algorithm pattern. Vector grouping aggregation achieves dramatically performance improvements against traditional hash based grouping aggregation operation. The experimental results show that the maximal performance gains between vector grouping aggregation algorithm against the leading main-memory database Hyper and GPU database MapD achieves 5~8 times improvements. The vector grouping aggregation approach not only improves the performance of OLAP aggregation, but also separates the data-intensive workload from query plan. The heterogeneous computing platform can optimally configure the computing resources according to hardware characteristics to improve the overall OLAP performance with hybrid processors.

Key words: CPU-GPU heterogeneous computing platform, vector grouping &, aggregation, group vector index, computing-intensive workload

摘要：

分组聚集计算是OLAP重要的操作符之一，分组聚集操作是一种数据密集型负载。在内存数据库和GPU数据库应用场景下不仅需要研究其性能优化技术，还需要研究如何优化分配分组聚集计算执行场地以最小化CPU与GPU之间的数据传输代价。针对异构计算平台的硬件特征提出了向量聚集计算技术，将位于传统流水线末端的分组聚集计算按照“早分组，晚聚集”策略进行分解与下推，实现将数据密集型的分组聚集计算从流水线中分离，将操作与处理器计算特性优化匹配，实现异构计算平台上最优的负载分配。通过将传统基于哈希分组的聚集计算转换为向量分组聚集计算，显著提升了分组聚集计算性能。实验结果表明，向量分组聚集技术相对于具有代表性的高性能内存数据库Hyper、GPU数据库MapD最大达到5~8倍的性能提升。向量聚集计算不仅提高了OLAP聚集计算性能，而且实现了将数据密集型负载从查询计划中分离的目标，使异构计算平台能够根据处理器的硬件特性优化配置计算资源，提高异构计算平台OLAP的整体性能。

关键词: CPU-GPU异构计算平台, 向量分组聚集, 分组向量索引, 数据密集型负载

ZHANG Yu, ZHANG Yansong. Research on Vector Grouping Aggregation Technology[J]. Computer Engineering and Applications, 2021, 57(11): 84-94.

张宇，张延松. 向量分组聚集计算技术研究[J]. 计算机工程与应用, 2021, 57(11): 84-94.

[1]	LIANG Tian, CAO Dexin. Improved and Simplified Particle Swarm Optimization Algorithm Based on Levy Flight [J]. Computer Engineering and Applications, 2021, 57(20): 188-196.
[2]	ZHAO Rui, ZHAO Guowei, ZHANG Juan, WANG Qiang, ZHAO Jielun, DONG Hongyue, ZHANG Xingzhong. Real-Time Fault Detection Method for High Voltage Transmission Line Based on CenterNet Improved Algorithm [J]. Computer Engineering and Applications, 2021, 57(17): 246-252.
[3]	ZHANG Tao, YU Jiong, LIAO Bin, BI Xuehua. Method for Attributed Graph Summarization Based on Minimum Description Length [J]. Computer Engineering and Applications, 2021, 57(15): 124-132.
[4]	LI Yan, LUI Jun. Joint Segmentation of Full Convolutional Deep Migration Network [J]. Computer Engineering and Applications, 2021, 57(1): 227-233.
[5]	WANG Haoyan, WANG Yumei. Research on Cognitive Wireless Networks with Two Kinds of Cognitive Users and Channel Aggregation [J]. Computer Engineering and Applications, 2020, 56(22): 100-108.
[6]	WU Tianwei, AN Siguang, SUN Qiqu, LI Mei, SUN Lihong, SHENTU Nanying. Improved Aggregation-Tree-Based Objective Reduction Optimization for Many-Objective Optimization [J]. Computer Engineering and Applications, 2020, 56(21): 47-53.
[7]	LI Jie, GONG Pengcheng. Cost Aggregation Based on Propagated Filtering in Stereo Matching [J]. Computer Engineering and Applications, 2020, 56(19): 189-196.
[8]	LI Ying, ZHOU Hongjun. [(I,A)]-Implication and Its Basic Properties [J]. Computer Engineering and Applications, 2019, 55(8): 59-65.
[9]	LIANG Yuying. Selection of Data Products Based on Probabilistic Hesitant Fuzzy Information Aggregation Algorithm [J]. Computer Engineering and Applications, 2019, 55(3): 219-224.
[10]	LIU Mingxia1, YOU Xiaoming1, LIU Sheng2. Adaptive Dynamic Chaotic Ant Colony Algorithm Based on Degree of Aggregation [J]. Computer Engineering and Applications, 2019, 55(3): 15-22.
[11]	LIU Tao, YANG Lintao, XU Jingya, XIE Wenwu, LIU Shouyin. Method for Inferring Social Ties by Eliminating Influence of Spatio-Temporal Aggregation [J]. Computer Engineering and Applications, 2019, 55(24): 128-134.
[12]	PAN Weiqiang. Improved Interval-Valued Hesitant Aggregation Operators and Their Applications to Logistics Enterprise Selection Decision-Making [J]. Computer Engineering and Applications, 2019, 55(20): 232-239.
[13]	PENG Dinghong, YANG Yang. Pythagorean Fuzzy Prioritized Aggregation Operators and Its Application in Decision Making [J]. Computer Engineering and Applications, 2019, 55(18): 218-222.
[14]	ZHANG Minhua, DU Youtian, WANG Qian. Combination of Static and Dynamic Modeling for Network Event Aggregation [J]. Computer Engineering and Applications, 2019, 55(18): 15-20.
[15]	ZHANG Xin, YAN Pei, GUO Yang, WANG Huihui. K-Shell Shortest Path Approximation Algorithm for Complex Networks [J]. Computer Engineering and Applications, 2019, 55(14): 54-60.

Research on Vector Grouping Aggregation Technology

向量分组聚集计算技术研究

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics