Computer Engineering and Applications ›› 2023, Vol. 59 ›› Issue (13): 259-265.DOI: 10.3778/j.issn.1002-8331.2208-0300

• Big Data and Cloud Computing • Previous Articles     Next Articles

Execution Plan Selection for Parallel Queries Using Graph Neural Networks

TAO Wenxia, NIU Baoning, LIU Haonan   

  1. College of Information and Computer, Taiyuan University of Technology, Jinzhong, Shanxi 030600, China
  • Online:2023-07-01 Published:2023-07-01

使用图神经网络选择并行查询的执行计划

陶温霞,牛保宁,柳浩楠   

  1. 太原理工大学 信息与计算机学院,山西 晋中 030600

Abstract: Queries constitute the largest proportion of workload of database systems(DBS), and their efficiency affects the performance of DBS. The execution of a query is affected by other parallel queries, resulting in query interaction(QI), which is the main factor that makes it difficult for query optimizers to select a good execution plan for parallel queries. An encoding scheme called features of plans based on operator(FPO) is proposed to represent execution plans. QI is reflected by data sharing and resource competition between operators. The plan selection model based on graph(PSG) is proposed. PSG takes operators as nodes, operator features as node features, and relations between operators as edges to generate heterogeneous graphs as inputs of the model. Considering that there are many kinds of relations between operators with different functions, relational graph convolutional network(RGCN) is used to aggregate information, obtain a graph representation of a query mix, and extract its QI. Through fully connected layers(FC), an execution plan is selected for a query. The average accuracy of PSG is 47.3?percentage points higher than that of query optimizers in PostgreSQL.

Key words: query optimization, query interaction, execution plan selection, graph neural network

摘要: 查询作为数据库系统(database system,DBS)占比最大的操作,其效率在很大程度上影响着DBS的性能,为查询选择一个较优的执行计划、提高查询效率是提高DBS效率的关键。查询执行受到其他查询的影响产生查询交互(query interaction,QI),是查询优化器难以为并行查询选择较优执行计划的主要因素。提出一种以操作为单位表示查询执行计划的编码方式(features of plans based on operator,FPO),并用操作之间的数据共享关系以及资源竞争关系反映QI;在此基础上,提出基于图神经网络的查询执行计划选择模型(plan selection based on graph,PSG)。PSG将操作作为节点,操作特征作为节点特征,操作间的关系作为边,生成异构图,作为模型的输入;考虑到操作间的关系有多种、作用不同,使用关系图卷积网络(relational graph convolutional network,RGCN)聚合信息,得到查询组合的图表示,提取其QI,通过全连接层(fully connected layers,FC),为查询选择执行计划。在PostgreSQL上的实验表明,PSG的平均准确率比查询优化器提高了47.3个百分点。

关键词: 查询优化, 查询交互, 选择执行计划, 图神经网络