Computer Engineering and Applications ›› 2018, Vol. 54 ›› Issue (20): 1-13.DOI: 10.3778/j.issn.1002-8331.1808-0339

Previous Articles     Next Articles

Survey on high performance regular expression matching algorithms

FU Zhe1,2, LI Jun2   

  1. 1.Department of Automation, Tsinghua University, Beijing 100084, China
    2.Research Institute of Information Technology, Tsinghua University, Beijing 100084, China
  • Online:2018-10-15 Published:2018-10-19

高性能正则表达式匹配算法综述

付  哲1,2,李  军2   

  1. 1.清华大学 自动化系,北京 100084
    2.清华大学 信息技术研究院,北京 100084

Abstract: Deep inspection is playing an important role in providing secure network environment and guaranteeing network service quality. As a key technology for high-performance deep inspection, regular expression matching algorithms attract a great deal of attention in both academic research and industrial practice. With the rapid development of network technology, the volume of network traffic keeps increasing, the rulesets used for deep inspection are becoming larger while more complex, and the network topology is more flexible and dynamic than ever. Existing matching methods are facing many challenges, including those in matching speed, memory consumption, update ability, and so on. This survey first introduces the background of regular expression matching problem, then categorizes existing algorithms in the aspects of memory compression, speed acceleration, new automata design, and rule partition and grouping, and evaluates them in matching speed, memory consumption, as well as preprocessing with real life network traffic and rulesets. A guideline for efficient regular expression matching under different scenarios, and an outline of future work on high performance regular expression matching algorithms are also provided.

Key words: regular expression matching, finite automata, algorithm, evaluation

摘要: 深度检测在维护网络安全、保证服务质量等方面扮演着重要的角色。正则表达式匹配算法作为高性能深度检测的核心技术,具有重要的研究价值和实践意义。随着网络流量不断增长、规则数目持续增多以及网络结构日趋灵活和动态,现有的正则表达式匹配算法面临着匹配速度、内存占用和更新能力等多方面的挑战。介绍了正则表达式匹配算法的研究背景,从空间压缩、匹配加速、新型自动机设计以及规则拆分和分组四个角度入手,分类总结了学术界具有影响力的研究成果。通过基于真实网络流量的评测,比较了几种经典匹配算法在不同规则集上的匹配速度、内存占用和预处理时间等性能指标,并给出了不同需求场景下高效正则表达式匹配算法的选择建议,归纳了高性能正则表达式匹配算法的下一步发展方向。

关键词: 正则表达式匹配, 有穷自动机, 算法, 评测