计算机工程与应用 ›› 2015, Vol. 51 ›› Issue (15): 43-47.

• 理论研究、研发设计 • 上一篇    下一篇

带可变长度通配符的模式匹配算法

沈  璐1,2,纪  允1,纪冬宝3,李  萍4   

  1. 1.合肥工业大学 计算机与信息学院,合肥 230009
    2.芜湖职业技术学院 电气系,安徽 芜湖 241006
    3.安徽现代电视技术有限公司,合肥 230088
    4.安徽林业职业技术学院 信息与艺术系,合肥 230031
  • 出版日期:2015-08-01 发布日期:2015-08-14

Pattern matching with variable length wildcards

SHEN Lu1,2, JI Yun1, JI Dongbao3, LI Ping4   

  1. 1.School of Computer and Information, Hefei University of Technology, Hefei 230009, China
    2.Department of Electrical Engineering, Wuhu Institute of Technology, Wuhu, Anhui 241006, China
    3.Anhui Modern TV Technology Co., Ltd, Hefei 230088, China
    4.Department of Information and Art, Anhui Vocational and Technical College of Forestry, Hefei 230031, China
  • Online:2015-08-01 Published:2015-08-14

摘要: 针对目前已有的算法在计算带有可变长度通配符的模式在文本中的出现次数问题时,需要的时间是多项式级别,而且受文本长度、模式长度和通配符间距的影响比较大。提出了一种基于Aho-Corasick自动机的AAI(pAttern mAtching with wIldcards) 算法,计算中采用了动态规划思想和有效的修剪技术。AAI算法的时间复杂度和空间复杂度分别为[O(n+m+α)]和[O(m+B)],其中[n]和[m]分别表示文本和模式的长度,[α]是所有子模式在文本中出现的数目,[B]是模式中通配符间距下限的总和。通过真实数据和人工数据的实验结果表明,AAI算法与同类算法相比具备显著的优势。

关键词: 模式匹配, 通配符, 动态规划, Aho-Corasick自动机

Abstract: Current works on computing the number of all matches of pattern with variable length wildcards in text need polynomial time, and have been greatly influenced by the length of the text, pattern, and wildcards interval. This paper proposes an algorithm AAI(pAttern mAtching with wIldcards) based on Aho-corasick automaton which adopts dynamic programming and effective pruning strategies. Let n and m be the length of P and T, respectively. The algorithm AAI need time[O(n+m+α)]and space[O(m+B)], where α is the total number of occurrences of the subpatterns in P within S, B is the sum of the lower bonds of the wildcards. Experimental results from different aspects validate that AAI is more stable and effective than other peers on both artificial and real-world data.

Key words: pattern matching, wildcards, dynamic programming, Aho-Corasick automaton