计算机工程与应用 ›› 2021, Vol. 57 ›› Issue (23): 122-128.DOI: 10.3778/j.issn.1002-8331.2007-0206

• 网络、通信与安全 • 上一篇    下一篇

可解释的基于图嵌入的Android恶意软件自动检测

王玉联,鲁鸣鸣   

  1. 中南大学 计算机学院,长沙 410083
  • 出版日期:2021-12-01 发布日期:2021-12-02

Interpretable Automatic Detection of Android Malware Based on Graph Embedding

WANG Yulian, LU Mingming   

  1. School of Computer Science, Central South University, Changsha 410083, China
  • Online:2021-12-01 Published:2021-12-02

摘要:

Android恶意软件的几何式增长驱动了Android恶意软件自动检测领域的发展。一些工作从可解释性的角度来分析Android恶意软件,通过分析模型获取最大影响的特征,为深度学习模型提供了一定的可解释性。这些方法基于特征相互独立的强假设,仅仅考虑特征各自对模型的影响,而在实际中特征之间总是存在着耦合,仅考虑单个特征对模型的影响,难以反映耦合作用,不能刻画不同类型软件中敏感API的组合模式。为解决该问题,将Android软件刻画成图,并结合图的结构信息和图节点内部的信息提出了一种基于图嵌入的方法来检测Android恶意软件。该方法通过注意力机制学习Android软件的低维稠密嵌入表示。实验结果表明,使用学到的嵌入表示进行恶意软件检测,不仅具有较高的分类精度,还可以通过分析注意力分数较大的路径寻找影响模型决策的模式以及定位恶意行为所涉及的敏感API序列。

关键词: Android恶意软件, 图嵌入学习, 敏感API序列, 注意力机制

Abstract:

The geometric growth of Android malware has driven the development of Android malware detection. Some work analyzed Android malware from the perspective of interpretability, and obtained the characteristics of the greatest impact through analyzing the model, which provided certain interpretability for the deep learning model. These methods, based on the strong assumption that features are independent of each other, only consider the influence of features on the model, while in practice there is always coupling between features. Considering only the influence of a single feature on the model, it is difficult to reflect the coupling effect and cannot describe the combination pattern of sensitive API in different types of software. To solve this problem, Android software is depicted as a graph, and combining the structure information of the graph and the information inside the graph node, a method based on graph embedding is proposed to detect Android malware. This method learns the low dimensional dense embedded representation of Android software through the attention mechanism. Experimental results show that using the learned embedded representation for malware detection not only has a higher classification accuracy, but also can find the patterns affecting model decision-making and locate the sensitive API sequences involved in malicious behavior by analyzing the path with a large attention score.

Key words: Android malware, graph embedded learning, sensitive API sequence, attention mechanism