Computer Engineering and Applications ›› 2015, Vol. 51 ›› Issue (18): 24-31.

Previous Articles     Next Articles

Vector computing oriented Array OLAP query processing technique

ZHANG Yu1,2, ZHANG Yansong1,2,3, CHEN Hong1,2, WANG Shan1,2   

  1. 1.Key Laboratory of Data Engineering and Knowledge Engineering in Renmin University of China, Beijing 100872, China
    2.School of Information, Renmin University of China, Beijing 100872, China
    3.National Survey Research Center at Renmin University of China, Beijing 100872, China
  • Online:2015-09-15 Published:2015-10-13

向量计算Array OLAP查询处理技术

张  宇1,2,张延松1,2,3,陈  红1,2,王  珊1,2   

  1. 1.中国人民大学 数据工程与知识工程教育部重点实验室,北京 100872
    2.中国人民大学 信息学院,北京 100872
    3.中国人民大学 中国调查与数据中心,北京 100872

Abstract: Multi-core and many-core processors come to be main stream configuration on new computing platform with powerful parallel computing and large in-memory storage. Multi-core processor commonly follows cache centric optimizations with LLC size awareness while many-core processors such as Phi and GPU co-processors are designed with less cache size but more hardware threads to overlap main memory access latency. As core amount increases, the computing framework prefers a code efficient and scalable design for massive processing cores. This paper presents an in-memory analytical computing framework Array OLAP with array store and vector processing to simplify storage model and processing model. In Array OLAP, dimensions are normalized as vector based dimension filter. The fact table is normalized as measure attributes with multidimensional index. With multidimensional index computing, a multidimensional query is simplified as vector index scan on fact table and the measure expressions are aggregated. The normalized vector lookup and vector index scan are efficient in code execution, and the staged processing model is adaptive for different computing platforms by assigning processing stages to the most suitable platform. Moreover, Array OLAP is data warehouse schema aware design. The vector processing model is simple but efficient enough for the small and slow incremental dimensions. It illustrates the Array OLAP framework in various platforms and evaluates the benchmark performance with state-of-the-art in-memory analytical databases. The experimental results show that Array OLAP outperforms other in-memory analytical engines and can be smoothly migrated to new hardware platform.

Key words: array On-line Analytical Processing(OLAP), array store, vector processing, in-memory On-line Analytical Processing(OLAP)

摘要: 多核和众核处理器成为新的具有强大并行处理能力的大内存计算平台的主流配置。多核处理器遵循以LLC(Last Level Cache,最后一级cache)大小为中心的优化技术,而众核处理器,如Phi、GPU协处理器,则采用较小的cache并以更多的硬件级线程来掩盖内存访问延迟的设计。随着处理核心数量的增长,计算框架更倾向于面向大规模处理核心的、代码执行效率高并且扩展性强的设计思想。提出了一种基于数组存储和向量处理的内存分析处理框架Array OLAP,简化OLAP的存储模型和查询处理模型。在Array OLAP计算框架中,维表规范化为基于向量的维过滤器,事实表规范化为带有多维索引的度量属性。通过多维索引计算,一个多维查询被简化为事实表上的向量索引扫描并根据度量表达式进行聚集计算。规范化的向量查找和向量索引扫描具有较好的代码执行效率,并且阶段化的处理模型更好地适应不同的计算平台,将计算阶段分配给最适合的计算平台。同时,Array OLAP是一种面向数据仓库模式特点的设计,向量处理模型设计简单,对于数据仓库维表较小且增长缓慢的特点具有较好的效率。描述了在不同平台上的Array OLAP计算框架并且通过基准测试评估Array OLAP的性能,通过与当前的内存分析型数据库的性能对比,Array OLAP性能超过主流的内存分析型数据库并且可以平滑地迁移到新的硬件平台。

关键词: 数组联机分析处理, 数组存储, 向量处理, 内存联机分析处理