Computer Engineering and Applications ›› 2013, Vol. 49 ›› Issue (8): 110-113.

Previous Articles     Next Articles

Improvement to data streams frequent itemsets mining algorithm WSW-Imp

WANG Xiaoxia, WANG Zhihe   

  1. College of Mathematics and Information Science, Northwest Normal University, Lanzhou 730070, China
  • Online:2013-04-15 Published:2013-04-15

对数据流频繁项集挖掘算法WSW-Imp的改进

王晓霞,王治和   

  1. 西北师范大学 数学与信息科学学院,兰州 730070

Abstract: In recent years, with the emergence of new applications, such as network traffic analysis, on-line transaction analysis, and network intrusion detection, data mining has become an important research topic. To the question of mining frequent itemsets in data streams, most of researches are based on traditional window models, i.e.the titled-time window model, the landmark window model, and the sliding window model. A new time window model named the weighted sliding window model is proposed by Pauray S.M.Tsai in 2009. In the same paper the author also proposed two algorithms, called WSW and WSW-Imp, where WSW-Imp is to improve the efficiency of WSW, to mine frequent itemsets in data streams using this window model. In this paper, after studying the weighted sliding window model and the algorithm of WSW-Imp, it proposes an algorithm named WSW-Imp2 to improve WSW-Imp further. Moreover, it proves that the algorithm WSW-Imp2 is more effective than WSW-Imp. Empirical results also show the conclusion.

Key words: data mining, data streams, data streams mining, frequent itemsets, weighted sliding window model

摘要: 近年来随着新的应用的出现,比如网络流量分析、在线事物分析和网络欺诈检测等,对数据流的挖掘成了一个越来越重要的课题。对于数据流频繁项集的挖掘,目前绝大部分的研究都集中在传统的窗口模式下进行,即时间衰退窗口模式、界标窗口模式和滑动窗口模式。Pauray S.M.Tsai于2009年提出了一种新的窗口模式:加权滑动窗口模式,并设计了两个基于此窗口模式的数据流频繁项集挖掘算法WSW和WSW-Imp,其中WSW-Imp是对WSW算法的改进。在研究了加权滑动窗口模式以及WSW-Imp算法的基础上,对WSW-Imp算法作了进一步的改进,设计了算法WSW-Imp2,并从理论上证明了WSW-Imp2算法比WSW-Imp算法更高效,实验结果也表明了这一点。

关键词: 数据挖掘, 数据流, 数据流挖掘, 频繁项集, 加权滑动窗口模式