Computer Engineering and Applications ›› 204, Vol. 60 ›› Issue (17): 243-251.DOI: 10.3778/j.issn.1002-8331.2308-0347

• Big Data and Cloud Computing • Previous Articles     Next Articles

Parallel Query Execution Plan Selection for Fused Relational Graph Attention Networks

GUO Mengtao, NIU Baoning, YANG Rong   

  1. College of Computer Science and Technology, Taiyuan University of Technology, Jinzhong 030600, China
  • Online:2024-09-01 Published:2024-08-30

融合关系图注意力网络的并行查询执行计划选择

郭梦涛,牛保宁,杨茸   

  1. 太原理工大学 计算机科学与技术学院(大数据学院),山西 晋中 030600

Abstract: As one of the most important functions in database systems, the execution efficiency of queries directly determines the performance of the system. In parallel scenarios, query interaction (QI) essentially represents the interaction between operations, which is the key to accurately selecting a query execution plan. Existing models that measure QI at the operational granularity fail to describe the dynamics of interactions and only extract operational features to reflect QI, making it difficult to provide accurate QI measures for selecting execution plans in parallel scenarios. To this end, for the representation of QI, a query mix heterogeneous graph is proposed, with each operation as a node and each interaction type between two operations as an edge, to achieve a dynamic, operationally granular, and multi-interaction type representation of QI; for the feature extraction of QI, the multi-edge type weight calculation (MTWC) model is proposed to calculate the edge weight, which is used as the relationship feature to reflect the strength of interactions; for the selection of execution plans, query-mix heterogeneous graph classification (QHGC) model based on relational graph attention network (R-GAT) is proposed to select an execution plan for parallel queries. Experiments on PostgreSQL show that QHGC selects execution plans for queries with an accuracy of 90.4%, an average accuracy improvement of 48.2 percentage points over the query optimizer and 6.9 percentage points over the existing state-of-the-art model PSG.

Key words: query interaction, operation level, multi-edge type weight calculation (MTWC), execution plan, relational graph attention network (R-GAT)

摘要: 查询作为数据库系统中最重要的功能之一,它的执行效率直接决定系统的性能。并行场景下,查询交互(query interaction,QI)本质上表现为操作间的相互作用,是准确选择查询执行计划的关键。现有在操作粒度上度量QI的模型未能描述交互的动态性,只提取操作特征来反映QI,难以为并行场景下的执行计划选择提供准确的QI度量。为此,在QI的表示上,提出查询组合异构图,以操作为节点,操作间的交互关系为边,实现动态、操作粒度、多交互类型的QI表示;在QI特征提取上,提出多边类型权重计算(multi-edge type weight calculation,MTWC)模型用于计算边权重,并将其作为关系特征,体现交互的强弱程度;在执行计划的选择上,提出一种基于关系图注意力网络(relational graph attention network,R-GAT)的查询组合异构图分类模型(query-mix heterogeneous graph classification,QHGC),为并行查询选择执行计划。在PostgreSQL上的实验表明,QHGC为查询选择执行计划的准确率达90.4%,平均准确率比查询优化器提高48.2个百分点,比现有最先进的模型PSG提高6.9个百分点。

关键词: 查询交互, 操作级, 多边类型权重计算(MTWC), 执行计划, 关系图注意力网络(R-GAT)