Mining maximum frequent itemsets over uncertain data streams

Computer Engineering and Applications ›› 2016, Vol. 52 ›› Issue (19): 72-77.

Previous Articles Next Articles

Mining maximum frequent itemsets over uncertain data streams

LIU Huiting, HOU Mingli, ZHAO Peng, YAO Sheng

School of Computer Science and Technology, Anhui University, Hefei 230601, China

Online:2016-10-01 Published:2016-11-18

不确定数据流最大频繁项集挖掘算法研究

刘慧婷，候明利，赵鹏，姚晟

安徽大学计算机科学与技术学院，合肥 230601

Abstract

Abstract: For large data bases, the number of frequent itemsets is huge and redundancy, and mining maximum frequent itemsets is more suitable because the scale of the output is much smaller. But traditional mining maximum frequent itemsets algorithm assumes the availability of precise data. Mining frequent itemsets from uncertain data streams is much more complicated than precise streams, and there is no research on mining maximum frequent itemsets over uncertain data streams until now. Therefore, aiming at the shortcoming, the paper proposes a maximum frequent itemsets mining algorithm TUFSMax. The algorithm adopts a decay window model to find frequent itemsets through calculating expected supports, and it uses a new method, called marking the tree nodes. By using the new method, algorithm TUFSMax can avoid super detection in the course of mining all of the maximum frequent itemsets, to save the detection time. Experimental results show that the proposed algorithm is efficient in time and space.

Key words: uncertain data stream, maximum frequent items, super check

摘要： 对于大型数据，频繁项集挖掘显得庞大而冗余，挖掘最大频繁项集可以减少挖出的频繁项集的个数。可是对于不确定性数据流，传统判断项集是否频繁的方法已不能准确表达项集的频繁性，而且目前还没有在不确定数据流上挖掘最大频繁项集的相关研究。因此，针对上述不足，提出了一种基于衰减模型的不确定性数据流最大频繁项集挖掘算法TUFSMax。该算法采用标记树结点的方法，使得算法不需要超集检测就可挖掘出所有的最大频繁项集，节约了超集检测时间。实验证明了提出的算法在时间和空间上具有高效性。

关键词: 不确定性数据流, 最大频繁项集, 超集检测

LIU Huiting, HOU Mingli, ZHAO Peng, YAO Sheng. Mining maximum frequent itemsets over uncertain data streams[J]. Computer Engineering and Applications, 2016, 52(19): 72-77.

刘慧婷，候明利，赵鹏，姚晟. 不确定数据流最大频繁项集挖掘算法研究[J]. 计算机工程与应用, 2016, 52(19): 72-77.

[1]	LIU Yan, ZHANG Jin, CHEN Jing, YIN Meijuan, ZHANG Weili. Detection of hype groups based on mining maximum frequent itemsets in Microblogs [J]. Computer Engineering and Applications, 2017, 53(4): 90-97.
[2]	YIN Shaohong, SHAN Kunyu, FAN Guidan. Mining algorithm research of data stream maximum frequent itemsets in sliding window [J]. Computer Engineering and Applications, 2015, 51(22): 145-149.
[3]	LIU Qun，JIA Jiong. Mining algorithm of global frequent items in distributed database [J]. Computer Engineering and Applications, 2011, 47(29): 134-136.
[4]	HUANG Hongxing，WANG Xiuli，HUANG Xipei. Modified ant colony optimization for mining maximal frequent itemsets [J]. Computer Engineering and Applications, 2011, 47(13): 161-165.
[5]	TUO Wen-li,YAO Yong. Incremental updating algorithm of maximum frequent itemsets based on FP_tree [J]. Computer Engineering and Applications, 2009, 45(19): 117-119.
[6]	WANG Le¹,WANG Shui²,CHEN Bo¹,DONG Peng¹. Algorithm for mining maximum frequent itemset based on intersection pruning [J]. Computer Engineering and Applications, 2009, 45(13): 156-159.
[7]	JIANG Han¹,JIA Jiong²,XU Feng¹. Mining of maximum frequent itemsets and frequent closed itemsets based on frequent itemsets [J]. Computer Engineering and Applications, 2008, 44(28): 146-148.

Mining maximum frequent itemsets over uncertain data streams

不确定数据流最大频繁项集挖掘算法研究

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 7

Recommended Articles

Metrics