计算机工程与应用 ›› 2012, Vol. 48 ›› Issue (4): 121-124.

• 数据库、信号与信息处理 • 上一篇    下一篇

数据流中基于滑动窗口的序列模式挖掘算法

谢伙生,何星星   

  1. 福州大学 数学与计算机科学学院,福州 350108
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2012-02-01 发布日期:2012-04-05

Efficient algorithm for mining frequent sequential pattern based on sliding window in data stream

XIE Huosheng, HE Xingxing   

  1. College of Mathematics and Computer Science, Fuzhou University, Fuzhou 350108, China
  • Received:1900-01-01 Revised:1900-01-01 Online:2012-02-01 Published:2012-04-05

摘要: 序列模式发现是最重要的数据挖掘任务之一,并有着广阔的应用前景。针对静态数据库,序列模式挖掘已经被深入地研究,但针对基于数据流的序列模式挖掘的研究还不是十分深入。数据流有着无限性的特性,因此往往不能保存数据流中全部的数据,同时很多时候只对最近的时间段的序列模式感兴趣,提出一个有效的结合滑动窗口技术的挖掘序列模式的算法FPM-SW,算法利用到3个数据结构(PatternTable,CountTable和Ta-tree)来处理基于数据流的序列模式挖掘的复杂性问题。算法通过CountTable结构来保存以往的潜在频繁序列,考虑到在某些情况下CountTable占用内存过多,算法还结合了一种压缩CountTable技术来减少内存占用。FPM-SW的优点是可以最大限度地降低负正例的产生,实验表明FPM-SW具有较高的准确率。

关键词: 序列模式, 数据流挖掘, 滑动窗口

Abstract: Sequential pattern mining is one of the most important tasks of data mining and has broad applications. Sequential pattern mining has been studied extensively in static databases. However, the study of sequential pattern mining based on data streams is not very deep. Stream data has the characteristic of unlimited flow, it can not save all the data, and people usually are interested in the sequential patterns in recent time period, accordingly it introduces one effective method combining with sliding window technique for mining sequential patterns from data streams: FPM-SW algorithm(Frequent Pattern Mining-Sliding Window). It uses three data structures(PatternTable, CountTable and Ta-tree) to handle the complexities of mining frequent sequential patterns in data streams. FPM-SW algorithm uses CountTable structure to preserve the past potential frequent sequences, considering that in some cases the countTable uses too much memory, the algorithm also combines a CountTable compression techniques to reduce memory footprint. The excellence of the algorithm is that it can maximize the reduction of the number of false positive. Experimental results show that FPM-SW has higher accuracy.

Key words: sequential patterns, data stream mining, sliding window