Fast algorithm for mining frequent itemsets over data streams

doi:10.3778/j.issn.1002-8331.2008.34.044

Computer Engineering and Applications ›› 2008, Vol. 44 ›› Issue (34): 142-144.DOI: 10.3778/j.issn.1002-8331.2008.34.044

• 数据库、信号与信息处理 • Previous Articles Next Articles

Fast algorithm for mining frequent itemsets over data streams

XU Jian-min^1,2,HAO Li-wei¹,WANG Yu¹

1.College of Mathematic and Computer Science，Hebei University，Baoding，Hebei 071002，China
2.Institute of Systems Engineering，Tianjin University，Tianjin 300072，China

Received:2008-06-03 Revised:2008-09-04 Online:2008-12-01 Published:2008-12-01
Contact: XU Jian-min

数据流频繁项集的快速挖掘方法

徐建民^1,2,郝丽维¹,王煜¹

1.河北大学数学与计算机学院，河北保定 071002
2.天津大学系统与工程研究所，天津 300072

通讯作者: 徐建民

Abstract

Abstract: Recently，data streams mining has become a research hotspot at home and abroad，while mining frequent itemsets is an important problem in the data streams mining.According to the features of the data streams which is limitless and mobility，an algorithm called FIM-SW is proposed to mine the frequent itemsets over the sliding window.The vertical database representation is adopted in the proposed algorithm，each item is represented by bitvector，and the Apriori property is used to get frequent itemsets.The experimental results show that it improves the efficiency for mining observably.

Key words: data mining, data stream, frequent itemset, sliding window

摘要： 近年来，数据流挖掘一直是国内外研究的热点，频繁项集挖掘又是数据流挖掘中的重要问题。根据数据流无限性和流动性的特点，提出了一种在滑动窗口中挖掘频繁项集的算法FIM-SW，FIM-SW算法主要是采用垂直的数据库表示方法，使用二进制向量表示每个数据项，并利用Apriori性质产生频繁项集。实验结果表明，这种算法显著地提高了挖掘效率。

关键词: 数据挖掘, 数据流, 频繁项集, 滑动窗口

XU Jian-min^1,2,HAO Li-wei¹,WANG Yu¹. Fast algorithm for mining frequent itemsets over data streams[J]. Computer Engineering and Applications, 2008, 44(34): 142-144.

徐建民^1,2,郝丽维¹,王煜¹. 数据流频繁项集的快速挖掘方法[J]. 计算机工程与应用, 2008, 44(34): 142-144.

[1]	ZONG Xiaoping, TAO Zeze. Knowledge Tracing Model Based on Mastery Speed [J]. Computer Engineering and Applications, 2021, 57(6): 117-123.
[2]	GAO Tianyu, WANG Qingrong, YANG Lei. Data Mining Model Based on Attribute Dependability Enhancement of Rough Set [J]. Computer Engineering and Applications, 2021, 57(3): 87-93.
[3]	MA Yang, ZHAO Xujun. Multi-source Outlier Detection Algorithm Based on Relevant Subspace [J]. Computer Engineering and Applications, 2021, 57(17): 88-95.
[4]	ZHANG Nianpeng, WU Xu, ZHU Qiang. Entropy-Based Oversampling Framework [J]. Computer Engineering and Applications, 2021, 57(13): 96-101.
[5]	WANG Junhong, GUO Yahui. Imbalanced Data Stream Classification Algorithm for Dynamic Data Chunk [J]. Computer Engineering and Applications, 2021, 57(13): 124-129.
[6]	ZHOU Yu, ZHU Wenhao, FANG Qian, BAI Lei. Survey of Outlier Detection Methods Based on Clustering [J]. Computer Engineering and Applications, 2021, 57(12): 37-45.
[7]	ZHANG Bowen, LIU Zhi, SANG Guoming. Anomaly Detection Algorithm Based on Kernel Density Fluctuation [J]. Computer Engineering and Applications, 2021, 57(12): 132-136.
[8]	RAO Jiawang, MA Ronghua. Improved Kernel Density Estimator Based Spatial Point Density Algorithm [J]. Computer Engineering and Applications, 2021, 57(11): 260-265.
[9]	WANG Jie, CHEN Zhigang, LIU Jialing, CHENG Hongbing. Privacy Behavior Mining Technology for Cloud Computing Based on Clustering [J]. Computer Engineering and Applications, 2020, 56(5): 80-84.
[10]	WANG Zilong, LI Jin, SONG Yafei. Improved K-means Algorithm Based on Distance and Weight [J]. Computer Engineering and Applications, 2020, 56(23): 87-94.
[11]	JI Wenlu, WANG Hailong, SU Guibin, LIU Lin. Review of Recommendation Methods Based on Association Rules Algorithm [J]. Computer Engineering and Applications, 2020, 56(22): 33-41.
[12]	YI Junyan, WU Boya, YONG Qiaoling. Research on Clustering Algorithm of Elastic Net with Weighted Characteristics [J]. Computer Engineering and Applications, 2020, 56(22): 55-65.
[13]	XU Jiazhen, LI Ting, YANG Wei. Person Re-Identification by Multi-Scale Local Feature Selection [J]. Computer Engineering and Applications, 2020, 56(2): 141-145.
[14]	XU Qingyan, HE Li, ZHU Hongxi. Improved Detection Method of Concept Drift Based on the Hoeffding Inequality [J]. Computer Engineering and Applications, 2020, 56(19): 55-61.
[15]	LIU Wenfen, MU Xiaodong, HUANG Yuehua. Anomaly Detection Method Based on Multi-resolution Grid [J]. Computer Engineering and Applications, 2020, 56(17): 78-85.

Fast algorithm for mining frequent itemsets over data streams

数据流频繁项集的快速挖掘方法

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics