计算机工程与应用 ›› 2018, Vol. 54 ›› Issue (12): 14-20.DOI: 10.3778/j.issn.1002-8331.1803-0494

• 热点与综述 • 上一篇    下一篇

基于扩展前缀树的协议格式推断方法

洪  征,田益凡,张洪泽,吴礼发   

  1. 解放军陆军工程大学 指挥控制工程学院,南京 210000
  • 出版日期:2018-06-15 发布日期:2018-07-03

Extended prefix tree based protocol format inference

HONG Zheng, TIAN Yifan, ZHANG Hongze, WU Lifa   

  1. Institute of Command and Control Engineering, Army Engineering University of PLA, Nanjing 210000, China
  • Online:2018-06-15 Published:2018-07-03

摘要: 对未知网络协议进行协议格式推断在网络安全领域具有重要意义。现有的协议格式推断方法存在时间复杂度高、精确度较低等问题。提出了一种基于扩展前缀树协议格式推断方法。该方法首先通过N-gram分词获取候选协议关键词,使用互信息进行合并得到不同长度的协议关键词。在此基础上,依据与报文相对应的关键词序列构建扩展前缀树,实现对报文样本的初步聚类。而后,在扩展前缀树的基础上采用分段的多序列比对方法获取精确的协议格式。实验结果表明,该协议格式推断方法对于文本协议和二进制协议都能够取得理想的推断效果。

关键词: 协议格式推断, 互信息, 扩展前缀树, 多序列比对算法

Abstract: Network protocol format inference is of great significance in many network security applications. Most existing protocol format inference methods suffer from high computation complexity and low accuracy. A extended prefix tree based protocol format inference method is proposed in the paper. Firstly, the candidate keywords are obtained through N-gram word segmentation and merged into protocol keywords of different lengths according to mutual information. On the basis of protocol keywords, the extended prefix tree is constructed according to protocol keyword sequences, and the initial clustering is performed on the extended tree. Then, through  segmental multiple sequence alignment based on the extended prefix tree, the similar format will be combined and the precise protocol format can be obtained. Compared with traditional format inference methods, the proposed method reduces the time complexity of inference. Experimental results show that the proposed method performs well for both text protocols and binary protocols.

Key words: protocol format inference, mutual information, extended prefix tree, multiple sequence alignment algorithm