Computer Engineering and Applications ›› 2020, Vol. 56 ›› Issue (4): 92-98.DOI: 10.3778/j.issn.1002-8331.1811-0056

Previous Articles     Next Articles

Detection of Malicious Domain Names Based on AN and LSTM

ZHOU Kang, WAN Liang, DING Hongwei   

  1. 1.College of Computer Science and Technology, Guizhou University, Guiyang 550025, China
    2.Institute of Computer Theory and Software, Guizhou University, Guiyang 550025, China
  • Online:2020-02-15 Published:2020-03-06

基于AN和LSTM的恶意域名检测

周康,万良,丁红卫   

  1. 1.贵州大学 计算机科学与技术学院,贵阳 550025
    2.贵州大学 软件与理论研究所,贵阳 550025

Abstract:

At present, malicious domain names are widely used in acting-controlled Trojans, phishing fraud and other network attacks. Traditional malicious domain name detection methods have the problem of long-distance dependency. It is easy to ignore contextual information and difficult in detecting malicious domain names efficiently and accurately because of its high data dimension. This paper presents a deep learning method for detecting malicious domain names by Autoencoder Network(AN) for feature reduction combined with Long Short-Term Memory network(LSTM). Firstly, the word vector representation of the implementation containing semantics is used, and the problem that the traditional method leads to the sparse and dimension disaster of data representation is solved. The word vector is constructed by word2vec as the input of LSTM, then the correlation between LSTM input and output is sorted by Attention mechanism, the overall characteristics of the text are obtained, finally, the local features are merged with the whole feature, and the classification results are output by using the softmax classifier. Experimental results show that this method has a good performance in malicious domain name detection, and has higher detection rate and less detection time than the traditional method of detecting malicious domain names.

Key words: malicious domain name detection, long short-term memory network, word2vec, Attention mechanism

摘要:

目前,恶意域名被广泛应用于远控木马、钓鱼欺诈等网络攻击中,传统恶意域名检测方法存在长距离依赖性问题,容易忽略上下文信息并且数据维度过大,无法高效、准确地检测恶意域名。提出了一种自编码网络(Autoencoder Network,AN)降维和长短期记忆神经网络(Long Short-Term Memory network,LSTM)检测恶意域名的深度学习方法。利用实现包含语义的词向量表示,解决了传统方法导致的数据表示稀疏及维度灾难问题。由word2vec构建词向量作为LSTM的输入,利用Attention机制对LSTM输入与输出之间的相关性进行重要度排序,获取文本整体特征,最后将局部特征与整体特征进行特征融合,使用softmax分类器输出分类结果。实验结果表明,该方法在恶意域名检测上具有较好的表现,比传统检测恶意域名方法具有更高的检测率和实时性。

关键词: 恶意域名检测, 长短时记忆神经网络, word2vec, Attention机制