Fast subsequence matching over data stream

doi:10.3778/j.issn.1002-8331.2008.36.049

Computer Engineering and Applications ›› 2008, Vol. 44 ›› Issue (36): 174-178.DOI: 10.3778/j.issn.1002-8331.2008.36.049

• 数据库、信号与信息处理 • Previous Articles Next Articles

Fast subsequence matching over data stream

CHEN Wei-man¹,SU Liang²,GAO Chun-ming¹

1.College of Mathematics and Computer Science，Hunan Normal University，Changsha 410081，China
2.College of Computer，National University of Defence Technology，Changsha 410073，China

Received:2007-12-24 Revised:2008-02-29 Online:2008-12-21 Published:2008-12-21
Contact: CHEN Wei-man

数据流上快速子序列匹配

陈为满¹,苏亮²,高春鸣¹

1.湖南师范大学数学与计算机科学学院，长沙 410081
2.国防科学技术大学计算机学院，长沙 410073

通讯作者: 陈为满

Abstract

Abstract: Recently，techniques for data stream have been applied in widespread fields such as financial analysis，network monitoring，and sensor network，etc.The existing techniques，solving the similarity matching，are mainly for time series databases.However，it is difficult to adapt to stream data directly due to the high speed，continuity，real time and large quantity.Therefore，subsequence matching over data stream becomes a meaningful and challenging problem in a progressive and real-time fashion.In this paper，a novel bound technique based on DTW has been designed to make the best of similarity threshold to prune the redundant computing，as well as is fit for data streams in a“single pass”.Experiments with synthetic and real data show that the proposed method is at least 3 times faster than existing algorithm：SPRING，and only increasing several bytes without the loss of precision.

Key words: time series, subsequence matching, Dynamic Time Warping（DTW）, data stream

摘要： 数据流技术目前已广泛应用于金融分析、网络监控及传感器网络等诸多领域，而已有的相似性匹配技术主要针对时间序列数据库，难于直接应用于高速、连续、实时、海量的流数据，因此在数据流上渐进、实时地进行子序列匹配成为一个极具价值和挑战性的问题。在动态时间规整技术的基础上，设计了一种新颖的界限机制，充分利用相似性阈值，尽量减少冗余计算，算法完全符合数据流“单遍扫描”的性能要求，并通过大量的模拟和真实数据实验表明：与现有的SPRING算法相比，在不损失任何算法精度的前提下，仅增加几个字节的空间开销，速度至少提高3倍。

关键词: 时间序列, 子序列匹配, 动态时间归整, 数据流

CHEN Wei-man¹,SU Liang²,GAO Chun-ming¹. Fast subsequence matching over data stream[J]. Computer Engineering and Applications, 2008, 44(36): 174-178.

陈为满¹,苏亮²,高春鸣¹. 数据流上快速子序列匹配[J]. 计算机工程与应用, 2008, 44(36): 174-178.

[1]	LU Manjiao, ZHANG Wei, XU Tao. Optimal Replenishment Strategy Based on Dynamic Matrix Model [J]. Computer Engineering and Applications, 2021, 57(7): 263-268.
[2]	YAO Honggang, MU Nianguo. Prediction of Financial Time Series by EMD-LSTM Model [J]. Computer Engineering and Applications, 2021, 57(5): 239-244.
[3]	DING Zhihui, QIAO Gangzhu, CHENG Tan, SU Rong. Shapelets Transform Method Based on LSH [J]. Computer Engineering and Applications, 2021, 57(3): 112-119.
[4]	WU Minghui, HOU Lingyan, WANG Chao. Improved Mechanism of Prediction-Oriented Long Short-Term Memory Neural Network [J]. Computer Engineering and Applications, 2021, 57(21): 109-115.
[5]	WANG Junhong, GUO Yahui. Imbalanced Data Stream Classification Algorithm for Dynamic Data Chunk [J]. Computer Engineering and Applications, 2021, 57(13): 124-129.
[6]	CAO Wenchao, GAN Hongcheng. Early-Warning Model of Metro Station Passenger Flow Based on WiFi Data [J]. Computer Engineering and Applications, 2021, 57(13): 233-238.
[7]	ZHOU Yu, ZHU Wenhao, FANG Qian, BAI Lei. Survey of Outlier Detection Methods Based on Clustering [J]. Computer Engineering and Applications, 2021, 57(12): 37-45.
[8]	MA Chenbin, ZHANG Zhengbo, WANG Jing. Review of Deep Learning Based Physiological Abnormality Detection Research [J]. Computer Engineering and Applications, 2021, 57(10): 10-25.
[9]	ZHAN Peng, CHEN Lin, CAO Luhui, XU Haoran, LI Xueqing. Time Series Anomaly Detection Based on Kernel Turning Points Clipped Representation [J]. Computer Engineering and Applications, 2020, 56(23): 130-138.
[10]	WANG Jian, MAO Liming, YIN Aijun. Multi-dimensional DTW Combined with Shape Feature and Context Information [J]. Computer Engineering and Applications, 2020, 56(22): 42-47.
[11]	JIN Nansen, LIU Meiling, GU Xinran, HAN Yutong. Double Parameter Convolution Theory Model for Traffic Time Prediction [J]. Computer Engineering and Applications, 2020, 56(20): 258-263.
[12]	XU Qingyan, HE Li, ZHU Hongxi. Improved Detection Method of Concept Drift Based on the Hoeffding Inequality [J]. Computer Engineering and Applications, 2020, 56(19): 55-61.
[13]	YUAN Yaoyao, KANG Yan, LI Hao, NIU Ruicheng, LIANG Wentao, LI Jinyuan. Timing Traffic Flow Data Completion Based on ST-DCGAN [J]. Computer Engineering and Applications, 2020, 56(15): 140-146.
[14]	HU Yang, HU Xuegang, LI Peipei. Fast Short Text Data Stream Classification Method Based on Spark [J]. Computer Engineering and Applications, 2020, 56(14): 138-147.
[15]	LI Chun, GAO Fei, WANG Huiqing. Improved Fruit Fly Optimization Algorithm for Optimizing Time Series Prediction Model of CIAO-LSTM Network [J]. Computer Engineering and Applications, 2020, 56(11): 129-134.

Fast subsequence matching over data stream

数据流上快速子序列匹配

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics