Interpretable Automatic Detection of Android Malware Based on Graph Embedding

doi:10.3778/j.issn.1002-8331.2007-0206

Abstract

Abstract:

The geometric growth of Android malware has driven the development of Android malware detection. Some work analyzed Android malware from the perspective of interpretability, and obtained the characteristics of the greatest impact through analyzing the model, which provided certain interpretability for the deep learning model. These methods, based on the strong assumption that features are independent of each other, only consider the influence of features on the model, while in practice there is always coupling between features. Considering only the influence of a single feature on the model, it is difficult to reflect the coupling effect and cannot describe the combination pattern of sensitive API in different types of software. To solve this problem, Android software is depicted as a graph, and combining the structure information of the graph and the information inside the graph node, a method based on graph embedding is proposed to detect Android malware. This method learns the low dimensional dense embedded representation of Android software through the attention mechanism. Experimental results show that using the learned embedded representation for malware detection not only has a higher classification accuracy, but also can find the patterns affecting model decision-making and locate the sensitive API sequences involved in malicious behavior by analyzing the path with a large attention score.

Key words: Android malware, graph embedded learning, sensitive API sequence, attention mechanism

摘要：

Android恶意软件的几何式增长驱动了Android恶意软件自动检测领域的发展。一些工作从可解释性的角度来分析Android恶意软件，通过分析模型获取最大影响的特征，为深度学习模型提供了一定的可解释性。这些方法基于特征相互独立的强假设，仅仅考虑特征各自对模型的影响，而在实际中特征之间总是存在着耦合，仅考虑单个特征对模型的影响，难以反映耦合作用，不能刻画不同类型软件中敏感API的组合模式。为解决该问题，将Android软件刻画成图，并结合图的结构信息和图节点内部的信息提出了一种基于图嵌入的方法来检测Android恶意软件。该方法通过注意力机制学习Android软件的低维稠密嵌入表示。实验结果表明，使用学到的嵌入表示进行恶意软件检测，不仅具有较高的分类精度，还可以通过分析注意力分数较大的路径寻找影响模型决策的模式以及定位恶意行为所涉及的敏感API序列。

关键词: Android恶意软件, 图嵌入学习, 敏感API序列, 注意力机制

WANG Yulian, LU Mingming. Interpretable Automatic Detection of Android Malware Based on Graph Embedding[J]. Computer Engineering and Applications, 2021, 57(23): 122-128.

王玉联，鲁鸣鸣. 可解释的基于图嵌入的Android恶意软件自动检测[J]. 计算机工程与应用, 2021, 57(23): 122-128.

References

[1] CARLINI N，WAGNER D.Towards evaluating the robustness of neural networks[C]//2017 IEEE Symposium on Security and Privacy，2017：39-57.
[2] WANG W，WANG X，FENG D，et al.Exploring permission-induced risk in Android applications for malicious application detection[J].IEEE Transactions on Information Forensics and Security，2014，9（11）：1869-1882.
[3] VARSHA M V，VINOD P，DHANYA K A.Heterogeneous feature space for Android malware detection[C]//2015 Eighth International Conference on Contemporary Computing，2015：383-388.
[4] SARACINO A，SGANDURRA D，DINI G，et al.Madam：effective and efficient behavior-based Android malware detection and prevention[J].IEEE Transactions on Dependable and Secure Computing，2016，15（1）：83-97.
[5] ZHU J，WU Z，GUAN Z，et al.API sequences based malware detection for Android[C]//2015 IEEE 12th Intl Conf on Ubiquitous Intelligence and Computing and 2015 IEEE 12th Intl Conf on Autonomic and Trusted Computing and 2015 IEEE 15th Intl Conf on Scalable Computing and Communications and Its Associated Workshops（UIC-ATC-ScalCom），2015：673-676.
[6] BURGUERA I，ZURUTUZA U，NADJM-TEHRANI S.Crowdroid：behavior-based malware detection system for Android[C]//Proceedings of the 1st ACM Workshop on Security and Privacy in Smartphones and Mobile Devices，2011：15-26.
[7] RASTOGI V，CHEN Y，ENCK W.AppsPlayground：automatic security analysis of smartphone applications[C]//Proceedings of the Third ACM Conference on Data and Application Security and Privacy，2013：209-220.
[8] YUAN Z，LU Y，XUE Y.Droiddetector：Android malware characterization and detection using deep learning[J].Tsinghua Science and Technology，2016，21（1）：114-123.
[9] YANG X，LO D，LI L，et al.Characterizing malicious Android apps by mining topic-specific data flow signatures[J].Information and Software Technology，2017，90：27-39.
[10] SUAREZ-TANGIL G，DASH S K，AHMADI M，et al.Droidsieve：fast and accurate classification of obfuscated Android malware[C]//Proceedings of the Seventh ACM on Conference on Data and Application Security and Privacy，2017：309-320.
[11] KONG D，YAN G.Discriminant malware distance learning on structural information for automated malware classification[C]//Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining，2013：1357-1365.
[12] DAM K H T，TOUILI T.Learning Android malware[C]//Proceedings of the 12th International Conference on Availability，Reliability and Security，2017：1-9.
[13] KARBAB E M B，DEBBABI M，DERHAB A，et al.Android malware detection using deep learning on API method sequences[J].arXiv：1712.08996，2017.
[14] MCLAUGHLIN N，MARTINEZ DEL RINCON J，KANG B J，et al.Deep Android malware detection[C]//Proceedings of the Seventh ACM Conference on Data and Application Security and Privacy，2017：301-308.
[15] WANG W，ZHAO M，WANG J.Effective Android malware detection with a hybrid model based on deep autoencoder and convolutional neural network[J].Journal of Ambient Intelligence and Humanized Computing，2019，10（8）：3035-3043.
[16] MELIS M，MAIORCA D，BIGGIO B，et al.Explaining black-box Android malware detection[C]//2018 26th European Signal Processing Conference（EUSIPCO），2018：524-528.
[17] GUO W，MU D，XU J，et al.Lemna：explaining deep learning based security applications[C]//Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security，2018：364-379.
[18] ZHU D，XI T，JING P，et al.A transparent and multimodal malware detection method for Android apps[C]//Proceedings of the 22nd International ACM Conference on Modeling，Analysis and Simulation of Wireless and Mobile Systems，2019：51-60.
[19] GROVER A，LESKOVEC J.Node2vec：scalable feature learning for networks[C]//Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining，2016：855-864.
[20] PEROZZI B，KULKARNI V，CHEN H，et al.Online learning of multi-scale network embeddings[C]//Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining，2017：258-265.
[21] ARZT S，RASTHOFER S，BODDEN E.SUSI：a tool for the fully automated classification and categorization of Android sources and sinks：TUDCS-2013-0114[R].University of Darmstadt，2013.
[22] PAGLIARDINI M，GUPTA P，JAGGI M.Unsupervised learning of sentence embeddings using compositional n-gram features[J].arXiv：1703.02507，2017.
[23] WEI F，LI Y，ROY S，et al.Deep ground truth analysis of current Android malware[C]//International Conference on Detection of Intrusions and Malware，and Vulnerability Assessment，2017：252-276.
[24] ALLIX K，BISSYANDé T F，KLEIN J，et al.Androzoo：collecting millions of Android apps for the research community[C]//2016 IEEE/ACM 13th Working Conference on Mining Software Repositories（MSR），2016：468-471.
[25] ARP D，SPREITZENBARTH M，HUBNER M，et al.Drebin：effective and explainable detection of Android malware in your pocket[C]//Proceedings of NDSS，2014：23-26.
[26] HAN J，PEI J，YIN Y.Mining frequent patterns without candidate generation[J].ACM Sigmod Record，2000，29（2）：1-12.