不确定数据流最大频繁项集挖掘算法研究

计算机工程与应用 ›› 2016, Vol. 52 ›› Issue (19): 72-77.

不确定数据流最大频繁项集挖掘算法研究

刘慧婷，候明利，赵鹏，姚晟

安徽大学计算机科学与技术学院，合肥 230601

出版日期:2016-10-01 发布日期:2016-11-18

Mining maximum frequent itemsets over uncertain data streams

LIU Huiting, HOU Mingli, ZHAO Peng, YAO Sheng

School of Computer Science and Technology, Anhui University, Hefei 230601, China

Online:2016-10-01 Published:2016-11-18

摘要/Abstract

摘要： 对于大型数据，频繁项集挖掘显得庞大而冗余，挖掘最大频繁项集可以减少挖出的频繁项集的个数。可是对于不确定性数据流，传统判断项集是否频繁的方法已不能准确表达项集的频繁性，而且目前还没有在不确定数据流上挖掘最大频繁项集的相关研究。因此，针对上述不足，提出了一种基于衰减模型的不确定性数据流最大频繁项集挖掘算法TUFSMax。该算法采用标记树结点的方法，使得算法不需要超集检测就可挖掘出所有的最大频繁项集，节约了超集检测时间。实验证明了提出的算法在时间和空间上具有高效性。

关键词: 不确定性数据流, 最大频繁项集, 超集检测

Abstract: For large data bases, the number of frequent itemsets is huge and redundancy, and mining maximum frequent itemsets is more suitable because the scale of the output is much smaller. But traditional mining maximum frequent itemsets algorithm assumes the availability of precise data. Mining frequent itemsets from uncertain data streams is much more complicated than precise streams, and there is no research on mining maximum frequent itemsets over uncertain data streams until now. Therefore, aiming at the shortcoming, the paper proposes a maximum frequent itemsets mining algorithm TUFSMax. The algorithm adopts a decay window model to find frequent itemsets through calculating expected supports, and it uses a new method, called marking the tree nodes. By using the new method, algorithm TUFSMax can avoid super detection in the course of mining all of the maximum frequent itemsets, to save the detection time. Experimental results show that the proposed algorithm is efficient in time and space.

Key words: uncertain data stream, maximum frequent items, super check

刘慧婷，候明利，赵鹏，姚晟. 不确定数据流最大频繁项集挖掘算法研究[J]. 计算机工程与应用, 2016, 52(19): 72-77.

LIU Huiting, HOU Mingli, ZHAO Peng, YAO Sheng. Mining maximum frequent itemsets over uncertain data streams[J]. Computer Engineering and Applications, 2016, 52(19): 72-77.

[1]	刘琰，张进，陈静，尹美娟，张伟丽. 基于最大频繁项集挖掘的微博炒作群体发现方法[J]. 计算机工程与应用, 2017, 53(4): 90-97.
[2]	朱书眉，王诚. 基于最大频繁项集的图像分类技术[J]. 计算机工程与应用, 2016, 52(23): 181-184.
[3]	刘卫明1，2，蒯海龙1，陈志刚3，毛伊敏1，4. 基于有序树的不确定数据最大频繁项挖掘算法[J]. 计算机工程与应用, 2015, 51(24): 145-149.
[4]	尹绍宏，单坤玉，范桂丹. 滑动窗口中数据流最大频繁项集挖掘算法研究[J]. 计算机工程与应用, 2015, 51(22): 145-149.
[5]	刘群，贾泂. 一种分布式全局频繁项集挖掘方法[J]. 计算机工程与应用, 2011, 47(29): 134-136.
[6]	黄红星，王秀丽，黄习培. 挖掘最大频繁项集的改进蚁群算法[J]. 计算机工程与应用, 2011, 47(13): 161-165.
[7]	陈波,王乐,董鹏. 挖掘最大频繁项集的事务集迭代算法[J]. 计算机工程与应用, 2009, 45(6): 141-144.
[8]	王乐¹,王水²,陈波¹,董鹏¹. 交集剪枝法挖掘最大频繁项集[J]. 计算机工程与应用, 2009, 45(13): 156-159.
[9]	姜晗¹,贾泂²,徐峰¹. 基于频繁项集挖掘最大频繁项集和频繁闭项集[J]. 计算机工程与应用, 2008, 44(28): 146-148.
[10]	张忠平,宋晓辉,宋少英. 一种基于OUS的最大频繁项集挖掘算法[J]. 计算机工程与应用, 2008, 44(24): 148-150.
[11]	李芸，李青山. 基于约束的最大频繁项集挖掘算法[J]. 计算机工程与应用, 2007, 43(17): 160-163.