Algorithm for clustering frequent itemsets based on generators

doi:10.3778/j.issn.1002-8331.2008.35.002

Computer Engineering and Applications ›› 2008, Vol. 44 ›› Issue (35): 5-8.DOI: 10.3778/j.issn.1002-8331.2008.35.002

• 博士论坛 • Previous Articles Next Articles

Algorithm for clustering frequent itemsets based on generators

LI Jin-hong^1,2,YANG Bing-ru¹,SONG Wei²,HOU Wei¹

1.School of Information Engineering，University of Science and Technology Beijing，Beijing 100083，China
2.College of Information Engineering，North China University of Technology，Beijing 100144，China

Received:2008-09-12 Revised:2008-10-06 Online:2008-12-11 Published:2008-12-11
Contact: LI Jin-hong

基于生成子的频繁项集聚类算法

李晋宏^1,2,杨炳儒¹,宋威²,侯伟¹

1.北京科技大学信息工程学院，北京 100083
2.北方工业大学信息工程学院，北京 100144

通讯作者: 李晋宏

Abstract

Abstract: How to reduce the number of frequent itemsets effectively is a hot topic in data mining research.Clustering frequent itemsets is one solution to the problem.Since generators are lossless concise representations of all frequent itemsets，clustering generators is equivalent to clustering all frequent itemsets.A new algorithm for clustering frequent itemsets based on generators is proposed.Firstly，based on minimum description length principle，the rationality of clustering generators is discussed.Secondly，the pruning strategies and mining algorithm for generators are proposed.Finally，based on a new similarity criterion of frequent itemsets，the clustering algorithm is presented.Experimental results show that the proposed method can not only reduce the number of discovered itemsets，but also is efficient.

Key words: data mining, generator, clustering

摘要： 如何有效地约简频繁项集的数量是目前数据挖掘研究的热点。对频繁项集进行聚类是该问题的解决方法之一。由于生成子是全体频繁项集的无损精简表示，故对生成子进行聚类与对全体频繁项集进行聚类具有相同的效果。提出了一种基于生成子的频繁项集聚类算法。首先，利用最小描述长度原理，讨论了选择生成子进行聚类的合理性；其次，给出了生成子的剪枝策略及挖掘算法；最后，在一种新的项集相似性的度量标准的基础上，给生成子的聚类算法。实验结果表明，该方法可有效地减少项集的数量，并具有较高的挖掘效率。

关键词: 数据挖掘, 生成子, 聚类

LI Jin-hong^1,2,YANG Bing-ru¹,SONG Wei²,HOU Wei¹. Algorithm for clustering frequent itemsets based on generators[J]. Computer Engineering and Applications, 2008, 44(35): 5-8.

李晋宏^1,2,杨炳儒¹,宋威²,侯伟¹. 基于生成子的频繁项集聚类算法[J]. 计算机工程与应用, 2008, 44(35): 5-8.

[1]	LAN Hong, HUANG Min. Fusion of KNN Optimized Density Peaks and FCM Clustering Algorithm [J]. Computer Engineering and Applications, 2021, 57(9): 81-88.
[2]	GUO Xiaojing, SUI Haoda. Application of Improved YOLOv3 in Foreign Object Debris Target Detection on Airfield Pavement [J]. Computer Engineering and Applications, 2021, 57(8): 249-255.
[3]	LI Li, JI Xinyuan, SONG Song. Prediction Model for Number of Software Defects in Loop [J]. Computer Engineering and Applications, 2021, 57(7): 158-163.
[4]	YANG Fang, YIN Xi, SI Jianhui, LIU Hongyuan, WANG Xue. Mathematical Expression Similarity Calculation Method Based on Focus Clustering [J]. Computer Engineering and Applications, 2021, 57(6): 88-93.
[5]	ZONG Xiaoping, TAO Zeze. Knowledge Tracing Model Based on Mastery Speed [J]. Computer Engineering and Applications, 2021, 57(6): 117-123.
[6]	ZHAO Fan, ZHANG Lin, WEN Zhiquan, YANG Linlin, LIN Guangfeng. Direct and Efficient Natural Scene Chinese Character Approaching Spotting Method [J]. Computer Engineering and Applications, 2021, 57(6): 159-167.
[7]	HUO Guangyu, ZHANG Yong, SUN Yanfeng, YIN Baocai. Research on Archive Data Intelligent Classification Based on Semantic [J]. Computer Engineering and Applications, 2021, 57(6): 247-253.
[8]	PENG Qihui, XUAN Shibin, GAO Qing. Distribution Automatic Threshold Density Peak Clustering Algorithm [J]. Computer Engineering and Applications, 2021, 57(5): 71-78.
[9]	LI Yongzhen, LIAO Husheng. Multi-view Clustering via Graph Convolutional Neural Network [J]. Computer Engineering and Applications, 2021, 57(5): 115-122.
[10]	WANG Changlong, ZHANG Yuandong, MIAO Hong, YANG Yuheng. Application of Double Channel Convolutional Neural Network in Pumpkin Diseases Identification [J]. Computer Engineering and Applications, 2021, 57(5): 183-189.
[11]	HU Xiaomin, WANG Mingfeng, ZHANG Shourong, LI Min. New Differential Evolution with Particle Swarm Optimization Algorithm for Text Clustering [J]. Computer Engineering and Applications, 2021, 57(4): 61-67.
[12]	WANG Junling, LU Xinming. Video Key Frame Extraction Algorithm Based on Semantic Correlation [J]. Computer Engineering and Applications, 2021, 57(4): 192-198.
[13]	GAO Tianyu, WANG Qingrong, YANG Lei. Data Mining Model Based on Attribute Dependability Enhancement of Rough Set [J]. Computer Engineering and Applications, 2021, 57(3): 87-93.
[14]	WANG Fuyin, ZHANG Desheng, ZHANG Xiao. Adaptive Density Peaks Clustering Algorithm Combining with Whale Optimization Algorithm [J]. Computer Engineering and Applications, 2021, 57(3): 94-102.
[15]	CHEN Junfeng, ZHENG Zhongtuan. Over-Sampling Method on Imbalanced Data Based on WKMeans and SMOTE [J]. Computer Engineering and Applications, 2021, 57(23): 106-112.

Algorithm for clustering frequent itemsets based on generators

基于生成子的频繁项集聚类算法

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics