计算机工程与应用 ›› 2015, Vol. 51 ›› Issue (23): 139-142.

• 数据库、数据挖掘、机器学习 • 上一篇    下一篇

基于压缩后缀数组的近似字符串匹配算法

胥永康1,杨光露2,路松峰3   

  1. 1.中国工程物理研究院 计算机应用研究所,四川 绵阳 621999
    2.河南中烟工业有限责任公司 南阳卷烟厂,河南 南阳 473007
    3.华中科技大学 计算机科学与技术学院,武汉 430074
  • 出版日期:2015-12-01 发布日期:2015-12-14

Approximate string matching algorithm based on compressed suffix array

XU Yongkang1, YANG Guanglu2, LU Songfeng3   

  1. 1.Institute of Computer Application Technology, China Academy of Engineering Physics, Mianyang, Sichuan 621999, China
    2.Nanyang Cigarette Factory, China Tobacco Henan Industrial CO., Ltd, Nanyang, Henan 473007, China
    3.School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
  • Online:2015-12-01 Published:2015-12-14

摘要: 近似字符串匹配是模式匹配研究领域中的一个重要研究方向。压缩后缀数组是字符串匹配、数据压缩等领域广泛使用的索引结构,具有检索速度快和适用广泛的优点。利用压缩后缀数组,提出了适合近似字符串匹配搜索算法的数据结构,并在此基础上提出了一种匹配搜索算法。实验结果表明,相对于现有的算法,提出的算法在小字母表的情况下具有计算优势。

关键词: 模式匹配, 近似串匹配, 后缀数组, 压缩后缀数组

Abstract: Approximate string matching is an important issue in the research area of pattern matching. Compressed suffix array is an index structure widely used in string matching and data compression, and it has the advantage of fast retrieval and can be widely applied. In this paper, it proposes a data structure suitable for approximate string matching searching algorithm, and based on the structure, it proposes a matching search algorithm. The result of the experiment shows that compared to the current algorithms, the algorithm proposed in this paper has computing advantage when the small alphabet exists.

Key words: pattern matching, approximate string matching, suffix array, compressed suffix array