计算机工程与应用 ›› 2020, Vol. 56 ›› Issue (7): 176-183.DOI: 10.3778/j.issn.1002-8331.1901-0399

• 模式识别与人工智能 • 上一篇    下一篇

基于门控图注意力神经网络的程序分类

谭丁武,张坤芳,刘燕,郑一基,鲁鸣鸣   

  1. 中南大学 信息科学与工程学院,长沙 410083
  • 出版日期:2020-04-01 发布日期:2020-03-28

Program Classification Using Gated Graph Attention Neural Network

TAN Dingwu, ZHANG Kunfang, LIU Yan, ZHENG Yiji, LU Mingming   

  1. College of Information Science and Engineering, Central South University, Changsha 410083, China
  • Online:2020-04-01 Published:2020-03-28

摘要:

在源代码挖掘领域,程序分类任务是实现机器自主理解源代码的基础工作。虽然自然语言处理相关模型和基于抽象语法树的系列模型已经被广泛应用于分类程序源代码,但这些工作没有考虑源代码中的数据流、控制流等数据信息。提出一种方法用于构建包含数据信息和语法结构的代码图EAST,并结合基于注意力机制的门控图神经网络模型(GGANN)实现程序分类。GGANN模型的注意力机制考虑到节点拓扑结构性质的差异性,从而对模型信息传播过程进行改进。实验表明,改进后的GGANN模型在程序分类任务上的精度高达98%。

关键词: 注意力机制, 图神经网络, 代码理解, 程序分类

Abstract:

In the field of source code mining, the program classification task is the basic work to understand source code. Although the Natural Language Processing(NLP) based models and the abstract syntax tree based models have been proposed to classify computer programs, these works have not considered data flow, control flow and other data information from source code. In order to solve the problem, this paper proposes a method to construct code graph EAST which contains data information and grammatical structure, and realizes program classification through combining Gating Graph Attention Neural Network(GGANN) model with attention mechanism. The attention mechanism of GGANN model takes the different topological structure of nodes into account,so as to improve the information dissemination process. Experiments have shown that the improved GGANN model achieves over 98% accuracy in program classification tasks.

Key words: attention mechanism, graph neural network, code understanding, program classification