计算机工程与应用 ›› 2011, Vol. 47 ›› Issue (10): 131-134.

• 数据库、信号与信息处理 • 上一篇    下一篇

界标窗口数据流频繁模式挖掘特性

张广路1,雷景生2   

  1. 1.海南师范大学 数学与统计学院,海口 571158
    2.南京邮电大学 计算机学院,南京 210046
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2011-04-01 发布日期:2011-04-01

Characteristics of data stream mining for frequent pattern based on landmark window

ZHANG Guanglu1,LEI Jingsheng2   

  1. 1,College of Mathematics and Statistic,Hainan Normal University,Haikou 571158,China
    2.College of Computer Science,Nanjing University of Post and Telecommunicatios,Nanjing 210046,China
  • Received:1900-01-01 Revised:1900-01-01 Online:2011-04-01 Published:2011-04-01

摘要: 随着数据流应用领域的不断扩大,数据流频繁模式挖掘技术逐渐成为数据挖掘领域研究的核心问题。对DSFPM算法进行研究和改进,提出了一种基于界标窗口的数据流频繁模式挖掘算法DSMFP_LW。该算法实现了单边扫描数据流;利用扩展的前缀模式树存储全局临界频繁模式,实现数据增量更新。通过对比实验,结果证明DSMFP_LW算法有较好的时间开销和空间利用率,优于经典的Lossy Counting算法,适合数据流频繁模式挖掘。

关键词: 频繁模式, 数据流, DSMFP_LW算法

Abstract: Data stream with the expanding applications,mining frequent pattern over data stream has gradually become the core issue in the field of data mining research.Based on the study and improvement of the DSFPM algorithm,a new single-pass algorithm,Data Stream Mining for Frequent Pattern Based on Landmark Window(DSMFP_LM),is proposed.DSMFP_LW has major features:Single streaming data scan for counting pattern’s information,extended prefix-tree-based compact pattern representation,and incremental update of data.The experimental results show that DSMFP_LW algorithm has better utilization of time and space efficiency,in addition,the well-known algorithm (Lossy Counting) is outperformed in the same streaming environment.

Key words: frequent pattern, data stream, DSMFP_LW algorithm