计算机工程与应用 ›› 2023, Vol. 59 ›› Issue (21): 287-295.DOI: 10.3778/j.issn.1002-8331.2206-0408

• 网络、通信与安全 • 上一篇    下一篇

基于特征选择的恶意Android应用检测方法

潘建文,张志华,林高毅,崔展齐   

  1. 北京信息科技大学 计算机学院,北京 100101
  • 出版日期:2023-11-01 发布日期:2023-11-01

Android Malware Detection Based on Feature Selection

PAN Jianwen, ZHANG Zhihua, LIN Gaoyi, CUI Zhanqi   

  1. School of Computer Science, Beijing Information Science and Technology University, Beijing 100101, China
  • Online:2023-11-01 Published:2023-11-01

摘要: 随着移动互联网和Android操作系统的快速发展,运行于Android系统的应用程序同样发展迅速,但隐藏在其中的恶意应用对用户的财产和隐私安全带来了严重威胁。针对Android应用特征数量过多,影响检测效率和精度的问题,提出一种基于特征选择的恶意Android应用检测方法Droid-TF-IDF,根据TF-IDF差值选择良性应用和恶意应用的代表性特征。静态分析APK文件,提取应用权限、API和操作码3类特征,形成特征集;分别计算各类特征的Droid-TF-IDF值,并进行排名;在特征集合中选择Droid-TF-IDF值较高的特征子集,构建随机森林、支持向量机(SVM)和卷积神经网络(CNN)等模型检测恶意Android应用。基于所提出的方法实现了原型工具,并在3?006个Android应用样本上进行了对比实验,实验结果表明,Droid-TF-IDF适用于权限、API和操作码3类特征,可在有效减少特征维度的同时,提升恶意应用检测的性能和效率。经特征选择后,检测恶意Android应用的F1值最高提升了0.6个百分点,时间消耗最多减少了35%。

关键词: Android应用, 静态分析, 特征提取, 特征选择, 恶意应用检测

Abstract: With the rapid development of the mobile Internet and the Android operating system, applications running on the Android system have also grows rapidly, but malware hidden in them poses a serious threat to users’ property and privacy. Because of the excessive number of Android application features, which affects the efficiency and accuracy of malware detection. Droid-TF-IDF, an Android malware detection approach based on feature selection, select representative features in benign application and malware according to difference of TF-IDF, is proposed. Firstly, APK files are statically analyzed to extract three types of features:permission, API, and opcode to compose a set of features. Then, the Droid-TF-IDF values of different features are calculated and ranked. Finally, a subset of features with greater Droid-TF-IDF values are selected from the feature set to build models such as random forest, support vector machine(SVM) and convolutional neural networks(CNN) to detect Android malware. A prototype tool is implemented based on the proposed approach, and experiments are carried on 3?006 Android applications. The experimental results show that Droid-TF-IDF can be used for three types of features:permission, API, and opcode, and effectively reduce the dimension of features and improve the performances and efficiency of malware detection. After feature selection, the F1 value for detecting Android malware increased by 0.6 percentage points at most, and the time consumption decreased by 35% at most.

Key words: Android application, static analysis, feature extraction, feature selection, malware detection