Computer Engineering and Applications ›› 2009, Vol. 45 ›› Issue (18): 152-155.DOI: 10.3778/j.issn.1002-8331.2009.18.046

• 数据库、信息处理 • Previous Articles     Next Articles

Method for finding recent frequent itemsets over data streams

SHU Ping-da,CHEN Hua-hui   

  1. College of Computer Science & Engineering,Ningbo University,Ningbo,Zhejiang 315211,China
  • Received:2008-04-15 Revised:2008-07-31 Online:2009-06-21 Published:2009-06-21
  • Contact: SHU Ping-da

数据流上最近频繁项集挖掘算法

舒平达,陈华辉   

  1. 宁波大学 信息科学与工程学院,浙江 宁波 315211
  • 通讯作者: 舒平达

Abstract: Mining frequent itemsets in data streams means to find itemsets whose frequence more than minmum support threshold.Due to be widely used for rising applications,such as sensor network,newtwork traffic monitor,mining frequent itemsets in data streams will have a profound future.This paper proposes a new method-RFIF in order to mine frequent itemsets in data streams.RFIF aims at some practical applications in real life,it pays more attention to recent events,but also not discard historical data absolutely.Through using function GIMT,the threshold of maintaining data is increased,and the number of historical data is reduced.At last,the experiment results prove the effectiveness of RFIF.

Key words: data streams, data mining, frequent itemsets

摘要: 数据流频繁项集挖掘是指在数据流中找出出现频数大于给定的最小支持度的项集过程。随着一些新兴应用如传感器网络、网络监控等的出现,数据流中频繁项集挖掘引起了很大的重视。提出了一种新颖的数据流频繁项集挖掘算法RFIF。不同于现有算法,RFIF算法针对现实中的一些实际应用,更多的考虑最近时间发生的事件,但也不完全抛弃历史数据,通过引入GIMT函数,逐渐加大项集支持度的阈值,减少对历史数据中频繁项集的维护。实验验证了算法的有效性。

关键词: 数据流, 数据挖掘, 频繁项集