Phishing Detection Algorithm Based on Language Features of URL

doi:10.3778/j.issn.1002-8331.1809-0259

Computer Engineering and Applications ›› 2019, Vol. 55 ›› Issue (24): 84-90.DOI: 10.3778/j.issn.1002-8331.1809-0259

Previous Articles Next Articles

Phishing Detection Algorithm Based on Language Features of URL

WANG Yuqi, LIU Bowen, LIN Guoyuan

1.School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, Jiangsu 221116, China
2.Mine Digitization Engineering Research Center of the Ministry of Education, Xuzhou, Jiangsu 221116, China
3.State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China

Online:2019-12-15 Published:2019-12-11

基于URL语言特征的钓鱼网站检测算法

王雨琪，刘博文，林果园

1.中国矿业大学计算机科学与技术学院，江苏徐州 221116
2.矿山数字化教育部工程研究中心，江苏徐州 221116
3.南京大学计算机软件新技术国家重点实验室，南京 210023

Abstract

Abstract: In order to deal with detection avoidance strategies of phishing sites, a phishing detection algorithm based on language features of URL is proposed. Through analyzing the differences in different detection domains of phishing sites and legal sites, the concept of motif and sensitivity is defined to describe language features. First of all, the similarity of main level domain is detected based on motif. When the similarity is lower than the pre-set threshold, valid subdomain features are selected. Then language features of subdomains are studied and detected using random forests. The results show that the accuracy rate of the proposed algorithm is 95.6%. The system running time is relatively less, and the average recognition time is less than 1 s.

Key words: phishing site, Uniform Resource Locator（URL）, language feature, motif, sensitivity

摘要： 为了应对钓鱼网站的检测逃避策略，提出一种基于URL语言特征的钓鱼网站检测算法。通过分析钓鱼网站和合法网站的URL在不同检测域上的差异，定义基元和敏感度来描述其语言特征。先根据基元对主级域名进行相似性检测，当相似性低于预先设定的阈值时，选取有效的子域名特征，利用随机森林算法对子域名的语言特征进行学习和检测。实验结果表明，该算法的准确率达95.6%，系统运行时间相对较小，平均识别时间小于1 s。

关键词: 钓鱼网站, 统一资源定位符（URL）, 语言特征, 基元, 敏感度

WANG Yuqi, LIU Bowen, LIN Guoyuan. Phishing Detection Algorithm Based on Language Features of URL[J]. Computer Engineering and Applications, 2019, 55(24): 84-90.

王雨琪，刘博文，林果园. 基于URL语言特征的钓鱼网站检测算法[J]. 计算机工程与应用, 2019, 55(24): 84-90.

[1]	CUI Zengle, QIAN Xiaodong. Research on Optimization of Information Spreading Model of Blockchain Social Network [J]. Computer Engineering and Applications, 2021, 57(7): 59-69.
[2]	FENG Xiaodong, HUANG Shirong, DAI Guan’ou, YANG Weijia, LUO Yaozhi. Research and Application of Beetle Antennae Genetic Hybrid Algorithm [J]. Computer Engineering and Applications, 2021, 57(15): 90-100.
[3]	ZHANG Bowen, LIU Zhi, SANG Guoming. Anomaly Detection Algorithm Based on Kernel Density Fluctuation [J]. Computer Engineering and Applications, 2021, 57(12): 132-136.
[4]	GAO Qi, LI Hongjiao. Differential Private Data Protection with Period Sensitivity for Smart Meters [J]. Computer Engineering and Applications, 2020, 56(20): 73-81.
[5]	ZHANG Meng, SUN Bingzhen, CHU Xiaoli. Gout Diagnosis Model Based on Neighborhood Cost Sensitive Three-Way Decision [J]. Computer Engineering and Applications, 2020, 56(16): 218-225.
[6]	ZHU Shiqi, Nurbol. Bibliometric Analysis of Current Studies and Developing Trends on Phishing Sites Detection [J]. Computer Engineering and Applications, 2020, 56(15): 92-100.
[7]	JIA Baohui, HUANG Lin, LI Yaohua, LIN Yueguo. Performance Optimization Method of FastICA Algorithm for Bearing Fault Diagnosis [J]. Computer Engineering and Applications, 2019, 55(8): 208-214.
[8]	CAO Xia1, LI Ping1，2, ZHANG Luyao1. Social Recommendation Algorithm Based on Domain-Sensitive Interest Circle [J]. Computer Engineering and Applications, 2019, 55(4): 84-90.
[9]	YANG Jing1, XU Yan2, ZHAO Xin1. Research on Spiking neuron sensitivity to input perturbation [J]. Computer Engineering and Applications, 2017, 53(2): 6-11.
[10]	ZHU Zhanlong1，2, LI Jing3. Sensitivity analysis on multiple attribute weight of geomagnetic map suitability evaluation based on WPM method [J]. Computer Engineering and Applications, 2017, 53(13): 60-65.
[11]	ZHU Zhanlong, DONG Jianbin, LI Yamei. Sensitivity analysis on multiple attribute weight of geomagnetic map suitability evaluation [J]. Computer Engineering and Applications, 2017, 53(12): 45-49.
[12]	DONG Yindi1, LIU Chengjun2, LI Hongbing3, XIONG Qingyu4. Image fusion algorithm based on wavelet similarity and weight [J]. Computer Engineering and Applications, 2016, 52(7): 186-190.
[13]	SU Hongsheng, YIN Kaile. Nelder-mead simplex method based improved artificial bee colony [J]. Computer Engineering and Applications, 2016, 52(24): 50-56.
[14]	LUO Wenjuan1, YUAN Lifen1，2, HE Yigang2. Improved FastICA algorithm based on fifteen-order Newton iteration [J]. Computer Engineering and Applications, 2016, 52(20): 108-113.
[15]	XIE Li1, CHENG Yun1, ZENG Jiexian2, YU Sheng1. Image retrieval based on color and motif gradient direction co-occurrence histogram [J]. Computer Engineering and Applications, 2016, 52(10): 181-186.

Phishing Detection Algorithm Based on Language Features of URL

基于URL语言特征的钓鱼网站检测算法

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics