计算机工程与应用 ›› 2015, Vol. 51 ›› Issue (22): 145-149.

• 数据库、数据挖掘、机器学习 • 上一篇    下一篇

滑动窗口中数据流最大频繁项集挖掘算法研究

尹绍宏,单坤玉,范桂丹   

  1. 天津工业大学 计算机科学与软件学院,天津 300387
  • 出版日期:2015-11-15 发布日期:2015-11-16

Mining algorithm research of data stream maximum frequent itemsets in sliding window

YIN Shaohong, SHAN Kunyu, FAN Guidan   

  1. School of Computer Science and Software Engineering, Tianjin University of Technology, Tianjin 300387, China
  • Online:2015-11-15 Published:2015-11-16

摘要: 数据流最大频繁项集的项集数目相对很少并且已隐含所有的频繁项集,所以数据流中最大频繁项集的挖掘具有很好的时空效率并且有很大的意义,也受到了业界更多的关注。针对数据流最大频繁项集的挖掘,提出了在滑动窗口中基于矩阵的数据流最大频繁项集挖掘方法SWM-MFI,主要采用两个矩阵来存储数据信息:一个矩阵是事务矩阵,存储事务数据;一个矩阵是二项集矩阵,存放频繁2-项集。通过二项集矩阵扩展得到频繁k-项集,基于SWM-MFI算法挖掘出最大频繁项集。经过理论和实验证明该算法具有很好的时效性。

关键词: 数据流, 滑动窗口, 最大频繁项集, 矩阵

Abstract: The number of itemsets in data stream maximum frequent itemsets is relatively few and has implied all frequent itemsets, so mining data stream maximum frequent itemsets has better efficiency in time and space and has great significance. It has gotten more attention by the industry. In view of the data stream maximum frequent itemsets, this paper proposes a mining method called SWM-MFI based on matrix of data stream maximum frequent itemsets in sliding window. The method stores the data information using two Matrixes:one called transaction matrix stores the transaction data and the other one called 2-itemsets matrix stores frequent 2-itemsets. Frequent k-itemsets can be got through the 2-itemsets matrix and the maximum frequent itemsets can be mined based on the method of SWM-MFI. The theories and experiments show that the method is better on time efficiency.

Key words: data stream, sliding window, maximum frequent itemsets, matrix