计算机工程与应用 ›› 2008, Vol. 44 ›› Issue (13): 150-152.

• 数据库、信号与信息处理 • 上一篇    下一篇

挖掘数据流近似频繁项的改进算法

王秀坤,王铁存,周国能,冯 维   

  1. 大连理工大学 计算机系,辽宁 大连 116023
  • 收稿日期:2007-08-21 修回日期:2007-11-19 出版日期:2008-05-01 发布日期:2008-05-01
  • 通讯作者: 王秀坤

Improved algorithm for mining approximate frequent item over data streams

WANG Xiu-kun,WANG Tie-cun,ZHOU Guo-neng,FENG Wei   

  1. Department of Computer,Dalian University of Technology,Dalian,Liaoning 116023,China
  • Received:2007-08-21 Revised:2007-11-19 Online:2008-05-01 Published:2008-05-01
  • Contact: WANG Xiu-kun

摘要: 数据流的无限性、连续性和速度快等特点,使得挖掘出所有准确的数据流频繁项通常是不可能的.算法的空间复杂度和时间复杂度通常是评价频繁项挖掘算法优劣的两个主要度量.通过引入局部性原理改进数据流近似频繁项的挖掘算法,该算法的空间复杂性为O(1/ε),数据流每个数据项的最坏处理时间是O(1/ε),其最好处理时间是O(1),输出结果的频率值误差为∑_(i=2)^j(1-μi)×ki

关键词: 数据流, 数据流挖掘, 频繁项

Abstract: Because of the rapid data arriving speed and huge size of data set in stream model,it is usually unable to find all the accurate frequent items of a data stream.The space complexity and the time complexity are the main measurement which is used to evaluate the strongpoints and weaknesses of algorithm.This paper proposes an improved algorithm based on principle of locality to find ε-approximate frequent items of a data stream,its space complexity is O(1/ε).The processing time for each item is O(1/ε) in the worst and the processing time for each item is O(1) in the best.Moreover,the frequency error bound of the results returned by the proposed algorithm is ∑_(i=2)^j(1-μi)×ki.

Key words: data stream, data stream mining, frequent item