计算机工程与应用 ›› 2011, Vol. 47 ›› Issue (7): 135-138.

• 数据库、信号与信息处理 • 上一篇    下一篇

动态滑动窗口的数据流聚类方法

张忠平,王 浩,薛 伟,夏 炎   

  1. 燕山大学 信息科学与工程学院,河北 秦皇岛 066004
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2011-03-01 发布日期:2011-03-01

Approach for data streams clustering over dynamic sliding windows

ZHANG Zhongping,WANG Hao,XUE Wei,XIA Yan   

  1. College of Information Science and Engineering,Yanshan University,Qinhuangdao,Hebei 066004,China
  • Received:1900-01-01 Revised:1900-01-01 Online:2011-03-01 Published:2011-03-01

摘要: 数据流聚类是聚类分析中的重要问题。针对数据流的流速是变化的问题,在两阶段聚类框架基础上提出基于动态滑动窗口的数据流聚类算法。在线阶段,引入微聚类特征来存储数据流的概要信息,利用存储的概要信息动态调整滑动窗口规模,并计算数据点与微聚类中心的距离,以维护微聚类特征;离线阶段,对在线聚类阶段的聚类结果采用K-means算法进行宏聚类,生成最终聚类。实验结果表明,该算法具有较高的聚类质量和较好的伸缩性。

关键词: 数据挖掘, 数据流, 聚类, 滑动窗口

Abstract: The clustering of data streams is an important problem for clustering analysis.In order to address the data streams with varying speed,an efficient data streams clustering algorithm over dynamic sliding windows is proposed,which based on the two-phased framework.In the online component,the novel micro-cluster feature is introduced to store the important statistical information of data streams.Through computing the distances from data points to the center of each micro-cluster,and adjusting the sizes of sliding windows,the corresponding clustering features are maintained dynamically.In the offline component,by employing the mean values of the micro-clusters in online component,k-means algorithm is adopted to generate the final clustering results.Experimental results show that this approach has higher clustering purity and better scalability.

Key words: data mining, data streams, clustering, sliding windows