计算机工程与应用 ›› 2021, Vol. 57 ›› Issue (15): 140-146.DOI: 10.3778/j.issn.1002-8331.2004-0385

• 网络、通信与安全 • 上一篇    下一篇

结合CNN和Catboost算法的恶意安卓应用检测模型

苏庆,林华智,黄剑锋,林志毅   

  1. 广东工业大学 计算机学院,广州 510006
  • 出版日期:2021-08-01 发布日期:2021-07-26

Malicious Android Application Detection Combining CNN and Catboost Algorithm

SU Qing, LIN Huazhi, HUANG Jianfeng, LIN Zhiyi   

  1. School of Computer Science and Technology, Guangdong University of Technology, Guangzhou 510006, China
  • Online:2021-08-01 Published:2021-07-26

摘要:

针对恶意安卓应用程序检测中存在的特征维度大、检测效率低的问题,结合卷积神经网络CNN良好的特征提取和降维能力以及catboost算法无需广泛数据训练即可产生较好分类结果的优点,构建一个CNN-catboost混合恶意安卓应用检测模型。通过逆向工程获取安卓应用的权限、API包、组件、intent、硬件特性和OpCode特征等静态特征并映射为特征向量,再在特征处理层使用卷积核对特征进行局部感知处理以增强信号。使用最大池化对处理后的特征进行下采样,降低维数并保持特征性质不变。将处理后的特征作为catboost分类层的输入向量,利用遗传算法的全局寻优能力对catboost模型进行调参,进一步提升分类准确率。对训练完成的模型,分别使用已知和未知类型的安卓应用程序数据集作实际应用测试。实验结果表明CNN-catboost模型调参用时较少,在预测精度和检测效率上也展示出较为良好的效果。

关键词: 恶意安卓应用, 卷积神经网络, Catboost分类算法, 遗传算法

Abstract:

In malicious Android application detection, there exists problems such as high dimensionality of features and low efficiency of detection. In order to solve the above problems, a CNN-catboost hybrid model is proposed. The proposed CNN-catboost model, the convolution neural network can help feature extraction and dimension reduction, and the catboost classification algorithm has the good generalization ability. The static features of Android application, such as permissions, API packages, components, intents, hardware features and OpCode features, acquiring through reverse engineering, are encoded as feature vectors. In the feature processing layer, the local features are extracted by using the convolution kernel. The maximum pooling is used to downsample the processed features to reduce the dimension while keeping the characteristic property the same. The downsampled features are used as the input vector of catboost classification layer, a genetic algorithm of global optimization ability is used to adjust the parameters of the catboost model to further improve classification accuracy. The model is tested with known and unknown type of Android app dataset. The experimental result shows that the CNN-catboost hybrid model takes less time to tune parameters, and can get promising prediction accuracy and detection efficiency.

Key words: malicious Android application, convolutional neural network, Catboost classification algorithm, genetic algorithm