计算机工程与应用 ›› 2010, Vol. 46 ›› Issue (24): 138-140.DOI: 10.3778/j.issn.1002-8331.2010.24.042

• 数据库、信号与信息处理 • 上一篇    下一篇

面向数据流的频繁项集挖掘研究

孟彩霞   

  1. 西安邮电学院 计算机科学系,西安 710061
  • 收稿日期:2009-02-12 修回日期:2010-02-05 出版日期:2010-08-21 发布日期:2010-08-21
  • 通讯作者: 孟彩霞

Research on mining frequent itemsets in data streams

MENG Cai-xia   

  1. Department of Computer Science,Xi’an University of Posts & Telecommunications,Xi’an 710065,China
  • Received:2009-02-12 Revised:2010-02-05 Online:2010-08-21 Published:2010-08-21
  • Contact: MENG Cai-xia

摘要: 针对数据流的特点,对数据流中频繁模式挖掘问题进行了研究,提出了数据流频繁项集挖掘算法FP-SegCount。该算法将数据流分段并利用改进的FP-growth算法挖掘分段中的频繁项集。然后,利用Count Min Sketch进行项集计数。算法解决了压缩统计和计算快速高效的问题。通过和FP-DS算法的实验对比,FP-SegCount算法具有较好的时间效率。

Abstract: According to the characteristic of data streams,the paper proposes FP-SegCount algorithm for mining frequent itemsets from data streams.The algorithm partitions the data stream and uses modified FP-growth algorithm to mining frequent itemsets in every segment.And then,it counts itemsets in Count Min Sketch.The algorithm solves the problem of compressed statistic and effective computation.Through experimentation and comparision with FP-DS algorithm,FP-SegCount algorithm has a good time efficiency.

中图分类号: