计算机工程与应用 ›› 2019, Vol. 55 ›› Issue (11): 52-59.DOI: 10.3778/j.issn.1002-8331.1809-0076

• 理论与研发 • 上一篇    下一篇

开源软件漏洞检测的混合深度学习方法

李元诚1,崔亚奇1,吕俊峰2,来风刚2,张  攀2   

  1. 1.华北电力大学 控制与计算机工程学院,北京 102206
    2.国家电网公司信息通信分公司,北京 100761
  • 出版日期:2019-06-01 发布日期:2019-05-30

Combined Deep Learning Method for Open Source Software Vulnerability Detection

LI Yuancheng1, CUI Yaqi1, LV Junfeng2, LAI Fenggang2, ZHANG Pan2   

  1. 1.School of Control and Computer Engineering, North China Electric Power University, Beijing 102206, China
    2.State Grid Information & Telecommunication Co, Beijing 100761, China
  • Online:2019-06-01 Published:2019-05-30

摘要: 针对开源软件代码质量参差不齐和存在安全隐患的问题,提出一种基于混合深度学习模型(DCnnGRU)的开源软件漏洞检测方法。以漏洞库中的关键点为切入点构建控制流图,从静态代码中提取出与关键点存在调用和传递关系的代码片段,将代码片段数字化为固定长度的特征向量,并作为DCnnGRU模型的输入。该模型用卷积神经网络(Convolutional Neural Network,CNN)作为与特征向量交互的接口,门控循环单元(Gated Recurrent Unit,GRU)嵌入到CNN中间,作为捕获代码调用关系的门控机制。首先进行卷积和池化处理,卷积核和池化窗口对特征向量进行降维。其次,GRU作为中间层嵌入到池化层和全连接层之间,能够保留代码数据之间的调用和传递关系。最后利用全连接层来完成归一化处理,将处理后的特征向量送入softmax分类器进行漏洞检测。实验结果验证了DCnnGRU模型比单独的CNN和RNN模型有更高的漏洞检测能力,准确率比RNN高出7%,比CNN高出3%。

关键词: 开源软件, 漏洞检测, 深度学习, 卷积神经网络, 门控循环单元

Abstract: Aiming at the problem of uneven quality or security risks of open source software, this paper proposes an open source software vulnerability detection method based on hybrid deep learning model(DCnnGRU). In this paper, the control flow graph is constructed with the key points in the vulnerability library as the entry point, and the code segment with the call and transfer relationship with the key point is extracted from the static code, and the code segment is digitized into a fixed length feature vector and used as the input of the DCnnGRU model. The model uses the Convolutional Neural Network(CNN) as an interface to interact with the feature vector. The Gated Recurrent Unit(GRU) is embedded in the middle of the CNN as a gating mechanism for capturing code call relationships. The DCnnGRU model first performs convolution and pooling processing, and the convolution kernel and the pooling window perform dimensionality reduction operations on the vector. Secondly, the GRU is embedded as an intermediate layer between the pooled layer and the fully connected layer, and can retain the call and transfer relationships between code data. Finally, the full connection layer is used to complete the normalization process, and the processed feature vector is sent to the softmax classifier for classification, and the output result is obtained. The experimental results verify that the DCnnGRU model has higher vulnerability detection capability than the CNN and RNN models alone. The accuracy rate is 7% higher than RNN and 3% higher than CNN.

Key words: open source software, vulnerability detection, deep learning, Convolutional Neural Network(CNN), Gated Recurrent Unit(GRU)