基于压缩后缀数组的近似字符串匹配算法

计算机工程与应用 ›› 2015, Vol. 51 ›› Issue (23): 139-142.

• 数据库、数据挖掘、机器学习 • 上一篇下一篇

基于压缩后缀数组的近似字符串匹配算法

胥永康1，杨光露2，路松峰3

1.中国工程物理研究院计算机应用研究所，四川绵阳 621999
2.河南中烟工业有限责任公司南阳卷烟厂，河南南阳 473007
3.华中科技大学计算机科学与技术学院，武汉 430074

出版日期:2015-12-01 发布日期:2015-12-14

Approximate string matching algorithm based on compressed suffix array

XU Yongkang1, YANG Guanglu2, LU Songfeng3

1.Institute of Computer Application Technology, China Academy of Engineering Physics, Mianyang, Sichuan 621999, China
2.Nanyang Cigarette Factory, China Tobacco Henan Industrial CO., Ltd, Nanyang, Henan 473007, China
3.School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China

Online:2015-12-01 Published:2015-12-14

摘要/Abstract

摘要： 近似字符串匹配是模式匹配研究领域中的一个重要研究方向。压缩后缀数组是字符串匹配、数据压缩等领域广泛使用的索引结构，具有检索速度快和适用广泛的优点。利用压缩后缀数组，提出了适合近似字符串匹配搜索算法的数据结构，并在此基础上提出了一种匹配搜索算法。实验结果表明，相对于现有的算法，提出的算法在小字母表的情况下具有计算优势。

关键词: 模式匹配, 近似串匹配, 后缀数组, 压缩后缀数组

Abstract: Approximate string matching is an important issue in the research area of pattern matching. Compressed suffix array is an index structure widely used in string matching and data compression, and it has the advantage of fast retrieval and can be widely applied. In this paper, it proposes a data structure suitable for approximate string matching searching algorithm, and based on the structure, it proposes a matching search algorithm. The result of the experiment shows that compared to the current algorithms, the algorithm proposed in this paper has computing advantage when the small alphabet exists.

Key words: pattern matching, approximate string matching, suffix array, compressed suffix array

胥永康1，杨光露2，路松峰3. 基于压缩后缀数组的近似字符串匹配算法[J]. 计算机工程与应用, 2015, 51(23): 139-142.

XU Yongkang1, YANG Guanglu2, LU Songfeng3. Approximate string matching algorithm based on compressed suffix array[J]. Computer Engineering and Applications, 2015, 51(23): 139-142.

[1]	孙云浩，李逢雨，李冠宇，韩冰，邢维康. 面向RDF图的多模式匹配方法[J]. 计算机工程与应用, 2020, 56(13): 84-92.
[2]	王红，祝寒，林海舟. 航空安全事故因果关系抽取方法的研究[J]. 计算机工程与应用, 2020, 56(11): 265-270.
[3]	谭章禄，王兆刚，胡翰. 时间序列趋势相似性度量方法研究[J]. 计算机工程与应用, 2020, 56(10): 94-99.
[4]	黄思猛1，程良伦2，王涛2. 基于双数组trie树的多模式复杂事件检测方法[J]. 计算机工程与应用, 2019, 55(4): 91-95.
[5]	夏念，嵩天. 短规则有效的快速多模式匹配算法[J]. 计算机工程与应用, 2017, 53(7): 1-8.
[6]	王歧1，2，3，卢毓海1，3，刘洋1，3，刘燕兵1，3，谭建龙1，3，孙波4. 支持模式串动态更新的多模式匹配Karp-Rabin算法[J]. 计算机工程与应用, 2017, 53(4): 39-44.
[7]	范洪博，史舒鹏，张晶. 改进的AAC多模式实时匹配算法[J]. 计算机工程与应用, 2017, 53(3): 68-73.
[8]	李莉1，江育娥1，林劼1，江秉华2. 基于KMP算法的改进算法KMPP[J]. 计算机工程与应用, 2016, 52(8): 33-37.
[9]	张玉叶1，王颖颖1，王春歆2，彭海军1. 分析参数相关和时序特征的飞行动作识别方法[J]. 计算机工程与应用, 2016, 52(5): 246-249.
[10]	袁淼，王鑫. 基于抽样的Deep Web模式匹配框架[J]. 计算机工程与应用, 2015, 51(3): 117-123.
[11]	肖洋1，2，朱青1，2，吴粤皖1. 基于压缩全文索引的演变图查询[J]. 计算机工程与应用, 2015, 51(2): 117-124.
[12]	汪宏1，2，王鹏1，2. 基于GPU的AC模式匹配改进算法[J]. 计算机工程与应用, 2015, 51(18): 7-12.
[13]	文举荣，王永利，刘伟. 支持多类型瑕疵度量的RETE改进算法[J]. 计算机工程与应用, 2015, 51(15): 48-55.
[14]	沈璐1，2，纪允1，纪冬宝3，李萍4. 带可变长度通配符的模式匹配算法[J]. 计算机工程与应用, 2015, 51(15): 43-47.
[15]	巫喜红. 改进的QS模式匹配算法的性能分析[J]. 计算机工程与应用, 2014, 50(2): 44-48.

基于压缩后缀数组的近似字符串匹配算法

Approximate string matching algorithm based on compressed suffix array

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics