Malicious Domain Names Detection by Improved Relief-C5.0

doi:10.3778/j.issn.1002-8331.2012-0475

Abstract

Abstract: Aiming at the problems of the high computational complexity, low real-time performance, and low accuracy of classification models in the current malicious domain name detection algorithms, a malicious domain name detection algorithm by Rf-C5（Relief-C5.0） is proposed. Firstly, the global URL features of the domain names to be tested are extracted. Then, the improved Relief algorithm is used to calculate the weight of the extracted features, and the features are prioritized according to the weight values. Finally, the key features of the top 20 weighted values are selected as the input of C5.0 classifier to classify legitimate domain names and malicious domain names. Experimental results show that under the large sample data set, compared with the current mainstream malicious domain name detection algorithms, the detection accuracy of Rf-C5 model increases by 1.58~4.91?percentage points on the basis of increasing the average detection rate.

Key words: malicious domain name, URL features, improved Relief algorithm, C5.0 classifier

摘要： 针对目前恶意域名检测算法中分类模型计算复杂度较大、实时性不强以及准确率不高等问题，提出了Rf-C5（Relief-C5.0）恶意域名检测算法模型。提取待测域名的全局URL特征，根据提取的特征按照改进的Relief算法进行权重计算，并依据权重值进行优先级排序；选取权重值排名前20的关键特征作为C5.0分类器的输入端，进行合法域名与恶意域名的分类。实验结果表明，在大样本数据集下，Rf-C5模型与当前主流恶意域名检测算法相比，在提高平均检测速率的基础上，检测准确率提高了1.58~4.91个百分点。

关键词: 恶意域名, URL特征, 改进的Relief算法, C5.0分类器

MA Donglin, ZHANG Shuhuan, ZHAO Hong. Malicious Domain Names Detection by Improved Relief-C5.0[J]. Computer Engineering and Applications, 2022, 58(11): 100-106.

马栋林, 张澍寰, 赵宏. 改进Relief-C5.0的恶意域名检测算法[J]. 计算机工程与应用, 2022, 58(11): 100-106.

References

[1] 网络安全信息与动态周报[EB/OL].[2020-12-23].https：//www.cert.org.cn/publish/main/44/2020/20201223142310431
885870/20201223142310431885870_.html.
Weekly report on network security information and dynamics[EB/OL].[2020-12-23].https：//www.cert.org.cn/publish/main/44/2020/20201223142310431885870/2020122314231
0431885870_.html
[2] LIU Z H，ZHANG Y D，CHEN Y Z，et al.Detection of algorithmically generated domain names using the recurrent convolutional neural network with spatial pyramid pooling[J].Entropy，2020，22（9）：1058.
[3] MAO J，ZHANG J M，TANG Z，et al.DNS anti-attack machine learning model for DGA domain name detection[J].Physical Communication，2020，40：101069.
[4] CAN N V，TU D N，TUAN T A，et al.A new method to classify malicious domain name using neutrosophic sets in DGA botnet detection[J].Journal of Intelligent & Fuzzy systems，2020，38（4）：4223-4236.
[5] SIVAGURU R，PECK J，OLUMOFIN F，et al.Inline detection of DGA domains using side information[J].IEEE Access，2020，8：141910-141922.
[6] 殷聪贤.基于大数据分析的恶意域名检测技术研究与实现[D].北京：北京邮电大学，2018.
YIN C X.Research and implementation of malicious domains detection technology based on big data analysis[D].Beijing：Beijing University of Posts and Telecommunications，2018.
[7] ZHAO H，CHANG Z B，WANG W J，et al.Malicious domain names detection algorithm based on lexical analysis and feature quantification[J].IEEE Access，2019，7：128990-128999.
[8] TRUONG D T，TRAN D T，HUYNH B.Detecting malicious fast-flux domains using feature-based classification techniques[J].Journal of Internet Technology，2020，21（4）：1061-1072.
[9] 崔甲，施蕾，李娟，等.一种高效的恶意域名检测框架[J].北京理工大学学报，2019，39（1）：64-67.
CUI J，SHI L，LI J，et al.An effective malicious domain detection framework[J].Transactions of Beijing Institute of Technology，2019，39（1）：64-67.
[10] FU Y，YU L，HAMBOLU O，et al.Stealthy domain generation algorithms（DGAs）[J].IEEE Transactions on Information Forensics & Security，2017，12（6）：1430-1443.
[11] YANG L H，LIU G J，DAI Y W，et al.Detecting stealthy domain generation algorithms using heterogeneous deep neural network framework[J].IEEE Access，2020，8：82876-82889.
[12] 杨路辉，刘光杰，翟江涛，等.一种改进的卷积神经网络恶意域名检测算法[J].西安电子科技大学学报，2020，47（1）：37-43.
YANG L H，LIU G J，ZHAI J T，et al.Improved algorithm for detection of the malicious domain name based on the convolutional neural network[J].Journal of Xidian University，2020，47（1）：37-43.
[13] YAN G H，LI Q，GUO D，et al.Discovering suspicious APT behaviors by analyzing DNS activities[J].Sensors，2020，20（3）：731.
[14] 常兆斌.基于域名构词特征的分阶段恶意域名检测算法研究[D].兰州：兰州理工大学，2020.
CHANG Z B.Research on staged malicious domain names detection algorithm based on domain names words formation features[D].Lanzhou：Lanzhou University of Technology，2020.
[15] ZHANG P P，LIU T W，ZHANG Y.Domain watcher：detecting malicious domains based on local and global textual features[C]//International Conference on Computational Science（ICCS），Zurich，Switzerland，2017：2408-2412.
[16] YANG F P，SHENG W T，LONG Y.A joint approach to detect malicious URL based on attention mechanism[J].International Journal of Computational Intelligence and Applications，2019，18（3）：1950021.
[17] YU B，PAN J，GRAY D，et al.Weakly supervised deep learning for the detection of domain generation algorithms[J].IEEE Access，2019，7（9）：51542-51556.
[18] SREYASEE D，ASHIT T，EHAB A.Prioritized active learning for malicious URL detection using weighted text-based features[C]//15th IEEE International Conference on Intelligence and Security Informatics-Security and Big Data（ISI），2017：107-112.
[19] 赵宏，常兆斌，王乐.基于词法特征的恶意域名快速检测算法[J].计算机应用，2019，39（1）：227-231.
ZHAO H，CHANG Z B，WANG L.Fast malicious domain name detection algorithm based on lexical features[J].Journal of Computer Applications，2019，39（1）：227-231.
[20] XU C Y，SHEN J Z，DU X.Detection method of domain names generated by DGAs based on semantic representation and deep neural network[J].Computers & Security，2019，85：77-88.
[21] SELVI J，RODRIGUZE R J，SORIA-OLIVAS E.Detection of algorithmically generated malicious domain names using masked N-grams[J].Expert Systems with Applications，2019，124：156-163.
[22] HUANG J Y，ZHANG G D，SHEN Y J.DGA domain name detection based on SVM under grey wolf optimization algorithm[C]//10th IEEE International Conference on Software Engineering and Service Science（ICSESS），2019：245-248.
[23] ZHOU S F，LIN L F，YUAN J K，et al.CNN-based DGA detection with high coverage[C]//17th IEEE Annual International Conference on Intelligence and Security Informatics（ISI），2019：62-67.