Improved algorithm for mining approximate frequent item over data streams

Computer Engineering and Applications ›› 2008, Vol. 44 ›› Issue (13): 150-152.

• 数据库、信号与信息处理 • Previous Articles Next Articles

Improved algorithm for mining approximate frequent item over data streams

WANG Xiu-kun,WANG Tie-cun,ZHOU Guo-neng,FENG Wei

Department of Computer，Dalian University of Technology，Dalian，Liaoning 116023，China

Received:2007-08-21 Revised:2007-11-19 Online:2008-05-01 Published:2008-05-01
Contact: WANG Xiu-kun

挖掘数据流近似频繁项的改进算法

王秀坤,王铁存,周国能,冯维

大连理工大学计算机系，辽宁大连 116023

通讯作者: 王秀坤

Abstract

Abstract: Because of the rapid data arriving speed and huge size of data set in stream model，it is usually unable to find all the accurate frequent items of a data stream.The space complexity and the time complexity are the main measurement which is used to evaluate the strongpoints and weaknesses of algorithm.This paper proposes an improved algorithm based on principle of locality to find ε-approximate frequent items of a data stream，its space complexity is O（1/ε）.The processing time for each item is O（1/ε） in the worst and the processing time for each item is O（1） in the best.Moreover，the frequency error bound of the results returned by the proposed algorithm is ∑_(i=2)^j（1-μ_i）×k_i.

Key words: data stream, data stream mining, frequent item

摘要： 数据流的无限性、连续性和速度快等特点，使得挖掘出所有准确的数据流频繁项通常是不可能的.算法的空间复杂度和时间复杂度通常是评价频繁项挖掘算法优劣的两个主要度量.通过引入局部性原理改进数据流近似频繁项的挖掘算法，该算法的空间复杂性为O（1/ε），数据流每个数据项的最坏处理时间是O（1/ε），其最好处理时间是O（1），输出结果的频率值误差为∑_(i=2)^j（1-μ_i）×k_i。

关键词: 数据流, 数据流挖掘, 频繁项

WANG Xiu-kun,WANG Tie-cun,ZHOU Guo-neng,FENG Wei. Improved algorithm for mining approximate frequent item over data streams[J]. Computer Engineering and Applications, 2008, 44(13): 150-152.

王秀坤,王铁存,周国能,冯维. 挖掘数据流近似频繁项的改进算法[J]. 计算机工程与应用, 2008, 44(13): 150-152.

[1]	WANG Junhong, GUO Yahui. Imbalanced Data Stream Classification Algorithm for Dynamic Data Chunk [J]. Computer Engineering and Applications, 2021, 57(13): 124-129.
[2]	ZHOU Yu, ZHU Wenhao, FANG Qian, BAI Lei. Survey of Outlier Detection Methods Based on Clustering [J]. Computer Engineering and Applications, 2021, 57(12): 37-45.
[3]	XU Qingyan, HE Li, ZHU Hongxi. Improved Detection Method of Concept Drift Based on the Hoeffding Inequality [J]. Computer Engineering and Applications, 2020, 56(19): 55-61.
[4]	GU Junhua, SU Ming, ZHANG Yajuan, ZHANG Danhong. Research on Fast Frequent Pattern Mining Algorithm Based on Bitmap-Code List [J]. Computer Engineering and Applications, 2020, 56(19): 86-93.
[5]	HU Yang, HU Xuegang, LI Peipei. Fast Short Text Data Stream Classification Method Based on Spark [J]. Computer Engineering and Applications, 2020, 56(14): 138-147.
[6]	SONG Yao1，2，3, SUN Xiaojuan1，2，3, HU Yuxin1，2，3, LEI Bin1，2，3, LU Xiaojun4. Quick-View Processing Method for Remote Sensing Data Based on Stream Computing [J]. Computer Engineering and Applications, 2019, 55(10): 77-82.
[7]	SHI Lukui1, ZHANG Xin1, SHI Shengli2. Parallelization and optimization of FP_Growth algorithm based on Spark [J]. Computer Engineering and Applications, 2018, 54(13): 52-58.
[8]	WEI Zijin1，2，3, XIAO Li2，3. Parallel out-of-core model simplification algorithm based on improved vertex clustering [J]. Computer Engineering and Applications, 2018, 54(13): 181-190.
[9]	LIU Yan, ZHANG Jin, CHEN Jing, YIN Meijuan, ZHANG Weili. Detection of hype groups based on mining maximum frequent itemsets in Microblogs [J]. Computer Engineering and Applications, 2017, 53(4): 90-97.
[10]	LIU Xiao, LIU Huiping, JIN Cheqing. Approximate solution for ER-Topk query upon uncertain data stream [J]. Computer Engineering and Applications, 2017, 53(4): 98-105.
[11]	HAN Chong1, YUAN Yingshan2, MEI Tao2, GENG Huiling2. Data stream outlier detection algorithm based on K-means [J]. Computer Engineering and Applications, 2017, 53(3): 58-63.
[12]	SHI Yingzhong 1, 2, CAO Jianfeng2, DENG Zhaohong1, JIANG Yizhang1. Flexible drift support vector machines for data stream classification [J]. Computer Engineering and Applications, 2017, 53(23): 118-122.
[13]	SONG Kuiyong1，2, WANG Nianbin1, WANG Hongbin1, KOU Xiangxia2. Nearest neighbor and closed pattern subspace clustering [J]. Computer Engineering and Applications, 2017, 53(16): 134-137.
[14]	LI Shaobo1，2, MENG Wei1, QU Jinglei1. GSWCLOF：density-based outlier detection algorithm on data stream [J]. Computer Engineering and Applications, 2016, 52(19): 7-11.
[15]	LIU Huiting, HOU Mingli, ZHAO Peng, YAO Sheng. Mining maximum frequent itemsets over uncertain data streams [J]. Computer Engineering and Applications, 2016, 52(19): 72-77.

Improved algorithm for mining approximate frequent item over data streams

挖掘数据流近似频繁项的改进算法

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics