计算机工程与应用 ›› 2017, Vol. 53 ›› Issue (7): 1-8.DOI: 10.3778/j.issn.1002-8331.1610-0356

• 热点与综述 • 上一篇    下一篇

短规则有效的快速多模式匹配算法

夏  念,嵩  天   

  1. 北京理工大学 计算机学院,北京 100081
  • 出版日期:2017-04-01 发布日期:2017-04-01

Short-rule-efficient rapid multi-pattern matching algorithm

XIA Nian, SONG Tian   

  1. School of Computer, Beijing Institute of Technology University, Beijing 100081, China
  • Online:2017-04-01 Published:2017-04-01

摘要: 随着网络技术快速发展,多模式匹配算法所处理的模式集合数目呈爆炸式增长且模式长度不统一,传统的多模式匹配算法已无法有效适应新的模式集合:不同的模式集合,同一算法呈现的性能差异明显。针对模式长度不等且分布不均匀的模式集合,提出一种改进WM的多模式匹配算法(MWM),将模式集合分为长短两个集合并构造各自的长短SHIFT表,辅助WM算法原有SHIFT表验证匹配效果,匹配过程由单一线程完成。该算法不仅减少了模式验证次数,而且提高了算法的平均跳转距离。实验结果表明,所提出的多模式匹配算法(MWM)在模式长度不等且分布不均匀的模式集合下表现出更优的性能,随着模式集合的数目增多,性能提升越明显。在模式集合数目达到100 000时,相比WM算法,该算法性能提升达到了40%。

关键词: 模式匹配, 字符串匹配, Wu-Manber算法

Abstract: With the rapid development of network technology, the number of pattern, used in multi-pattern matching algorithms, have experienced explosive growth in recent years, and the length of pattern is not uniform. Traditional multi-pattern matching algorithm cannot meet the newest pattern sets: the characteristic of pattern set plays largely influences on the performance of algorithms. A novel algorithm enhanced from the Wu-Manber(WM) algorithm, namely the Modified Wu-Manber(MWM) algorithm, is responsible for pattern set which has variant length of pattern and uneven distribution. The new algorithm partitions the pattern set into long and short subset and constructs respective SHIFT table in order to help the original SHIFT built in WM to verify whether exists pattern matching more deeply under a single thread program. In fact, the new algorithm not only reduces the verification frequency but also increases the average shift distance. Experiments show that the improved multi-pattern algorithm has better performance to deal with the non-uniform distribution and variant length of pattern set, in addition, the larger the pattern set, the better improved performance with new algorithm. Especially, when the number of pattern is 100 000, it improves performance by more than 40 percent.

Key words: pattern matching, string matching, Wu-Manber algorithm