Computer Engineering and Applications ›› 2020, Vol. 56 ›› Issue (22): 74-82.DOI: 10.3778/j.issn.1002-8331.1910-0030

Previous Articles     Next Articles

Android Malware Detection Based on Ensemble Learning Voting Algorithm

ZHAO Yuxin, Nurbol, AI Zhuang   

  1. 1.College of Software, Xinjiang University, Urumqi 830046, China
    2.Network Centre, Xinjiang University, Urumqi 830046, China
    3.College of Information Science and Engineering, Xinjiang University, Urumqi 830046, China
  • Online:2020-11-15 Published:2020-11-13

基于集成学习投票算法的Android恶意应用检测

赵宇鑫,努尔布力,艾壮   

  1. 1.新疆大学 软件学院,乌鲁木齐 830046
    2.新疆大学 网络中心,乌鲁木齐 830046
    3.新疆大学 信息科学与工程学院,乌鲁木齐 830046

Abstract:

Aiming at the detection technology of malicious application on Android platform, an Android malicious program detection method MASV(Soft-Voting Algorithm) based on ensemble learning voting algorithm is proposed to effectively classify unknown applications. The underlying data for the experiment are obtained from a known open source dataset by using an application set of 213 256 benign applications and 18 363 malicious applications. First, the dimensionality of features is reduced by using the SVM-RFE feature selection algorithm. Then, a collection of multiple classifiers is used to detect malicious applications and benign applications, the classifiers include SVM(Support Vector Machine), [K]-NN [(K]-Nearest Neighbor), NB(Na?ve Bayes), CART(Classification and Regression Tree) and RF(Random Forest). At the same time, the gradient ascent algorithm is used to determine the base classifier weight parameter of the ensemble learning soft vote. The experimental results show that the method achieves an accuracy of 99.27% in malicious application detection.

Key words: Android malicious application, ensemble learning, voting algorithm

摘要:

针对Android平台恶意应用的检测技术,提出一种基于集成学习投票算法的Android恶意程序检测方法MASV(Soft-Voting Algorithm),以有效地对未知应用程序进行分类。从已知开源的数据集中获取了实验的基础数据,使用的应用程序集包含213 256个良性应用程序以及18 363个恶意应用程序。使用SVM-RFE特征选择算法对特征进行降维。使用多个分类器的集合,即SVM(Support Vector Machine)、[K]-NN[(K]-Nearest Neighbor)、NB(Na?ve Bayes)、CART(Classification and Regression Tree)和RF(Random Forest),以检测恶意应用程序和良性应用程序。使用梯度上升算法确定集成学习软投票的基分类器权重参数。实验结果表明,该方法在恶意应用程序检测中达到了99.27%的准确率。

关键词: Android恶意应用, 集成学习, 投票算法