Computer Engineering and Applications ›› 2010, Vol. 46 ›› Issue (24): 138-140.DOI: 10.3778/j.issn.1002-8331.2010.24.042

• 数据库、信号与信息处理 • Previous Articles     Next Articles

Research on mining frequent itemsets in data streams

MENG Cai-xia   

  1. Department of Computer Science,Xi’an University of Posts & Telecommunications,Xi’an 710065,China
  • Received:2009-02-12 Revised:2010-02-05 Online:2010-08-21 Published:2010-08-21
  • Contact: MENG Cai-xia

面向数据流的频繁项集挖掘研究

孟彩霞   

  1. 西安邮电学院 计算机科学系,西安 710061
  • 通讯作者: 孟彩霞

Abstract: According to the characteristic of data streams,the paper proposes FP-SegCount algorithm for mining frequent itemsets from data streams.The algorithm partitions the data stream and uses modified FP-growth algorithm to mining frequent itemsets in every segment.And then,it counts itemsets in Count Min Sketch.The algorithm solves the problem of compressed statistic and effective computation.Through experimentation and comparision with FP-DS algorithm,FP-SegCount algorithm has a good time efficiency.

摘要: 针对数据流的特点,对数据流中频繁模式挖掘问题进行了研究,提出了数据流频繁项集挖掘算法FP-SegCount。该算法将数据流分段并利用改进的FP-growth算法挖掘分段中的频繁项集。然后,利用Count Min Sketch进行项集计数。算法解决了压缩统计和计算快速高效的问题。通过和FP-DS算法的实验对比,FP-SegCount算法具有较好的时间效率。

CLC Number: