计算机工程与应用 ›› 2008, Vol. 44 ›› Issue (36): 174-178.DOI: 10.3778/j.issn.1002-8331.2008.36.049

• 数据库、信号与信息处理 • 上一篇    下一篇

数据流上快速子序列匹配

陈为满1,苏 亮2,高春鸣1   

  1. 1.湖南师范大学 数学与计算机科学学院,长沙 410081
    2.国防科学技术大学 计算机学院,长沙 410073
  • 收稿日期:2007-12-24 修回日期:2008-02-29 出版日期:2008-12-21 发布日期:2008-12-21
  • 通讯作者: 陈为满

Fast subsequence matching over data stream

CHEN Wei-man1,SU Liang2,GAO Chun-ming1   

  1. 1.College of Mathematics and Computer Science,Hunan Normal University,Changsha 410081,China
    2.College of Computer,National University of Defence Technology,Changsha 410073,China
  • Received:2007-12-24 Revised:2008-02-29 Online:2008-12-21 Published:2008-12-21
  • Contact: CHEN Wei-man

摘要: 数据流技术目前已广泛应用于金融分析、网络监控及传感器网络等诸多领域,而已有的相似性匹配技术主要针对时间序列数据库,难于直接应用于高速、连续、实时、海量的流数据,因此在数据流上渐进、实时地进行子序列匹配成为一个极具价值和挑战性的问题。在动态时间规整技术的基础上,设计了一种新颖的界限机制,充分利用相似性阈值,尽量减少冗余计算,算法完全符合数据流“单遍扫描”的性能要求,并通过大量的模拟和真实数据实验表明:与现有的SPRING算法相比,在不损失任何算法精度的前提下,仅增加几个字节的空间开销,速度至少提高3倍。

关键词: 时间序列, 子序列匹配, 动态时间归整, 数据流

Abstract: Recently,techniques for data stream have been applied in widespread fields such as financial analysis,network monitoring,and sensor network,etc.The existing techniques,solving the similarity matching,are mainly for time series databases.However,it is difficult to adapt to stream data directly due to the high speed,continuity,real time and large quantity.Therefore,subsequence matching over data stream becomes a meaningful and challenging problem in a progressive and real-time fashion.In this paper,a novel bound technique based on DTW has been designed to make the best of similarity threshold to prune the redundant computing,as well as is fit for data streams in a“single pass”.Experiments with synthetic and real data show that the proposed method is at least 3 times faster than existing algorithm:SPRING,and only increasing several bytes without the loss of precision.

Key words: time series, subsequence matching, Dynamic Time Warping(DTW), data stream