动态滑动窗口的数据流聚类方法

计算机工程与应用 ›› 2011, Vol. 47 ›› Issue (7): 135-138.

• 数据库、信号与信息处理 • 上一篇下一篇

动态滑动窗口的数据流聚类方法

张忠平，王浩，薛伟，夏炎

燕山大学信息科学与工程学院，河北秦皇岛 066004

收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2011-03-01 发布日期:2011-03-01

Approach for data streams clustering over dynamic sliding windows

ZHANG Zhongping，WANG Hao，XUE Wei，XIA Yan

College of Information Science and Engineering，Yanshan University，Qinhuangdao，Hebei 066004，China

Received:1900-01-01 Revised:1900-01-01 Online:2011-03-01 Published:2011-03-01

摘要/Abstract

摘要： 数据流聚类是聚类分析中的重要问题。针对数据流的流速是变化的问题，在两阶段聚类框架基础上提出基于动态滑动窗口的数据流聚类算法。在线阶段，引入微聚类特征来存储数据流的概要信息，利用存储的概要信息动态调整滑动窗口规模，并计算数据点与微聚类中心的距离，以维护微聚类特征；离线阶段，对在线聚类阶段的聚类结果采用K-means算法进行宏聚类，生成最终聚类。实验结果表明，该算法具有较高的聚类质量和较好的伸缩性。

关键词: 数据挖掘, 数据流, 聚类, 滑动窗口

Abstract: The clustering of data streams is an important problem for clustering analysis.In order to address the data streams with varying speed，an efficient data streams clustering algorithm over dynamic sliding windows is proposed，which based on the two-phased framework.In the online component，the novel micro-cluster feature is introduced to store the important statistical information of data streams.Through computing the distances from data points to the center of each micro-cluster，and adjusting the sizes of sliding windows，the corresponding clustering features are maintained dynamically.In the offline component，by employing the mean values of the micro-clusters in online component，k-means algorithm is adopted to generate the final clustering results.Experimental results show that this approach has higher clustering purity and better scalability.

Key words: data mining, data streams, clustering, sliding windows

张忠平，王浩，薛伟，夏炎. 动态滑动窗口的数据流聚类方法[J]. 计算机工程与应用, 2011, 47(7): 135-138.

ZHANG Zhongping，WANG Hao，XUE Wei，XIA Yan. Approach for data streams clustering over dynamic sliding windows[J]. Computer Engineering and Applications, 2011, 47(7): 135-138.

[1]	兰红，黄敏. 融合KNN优化的密度峰值和FCM聚类算法[J]. 计算机工程与应用, 2021, 57(9): 81-88.
[2]	郭晓静，隋昊达. 改进YOLOv3在机场跑道异物目标检测中的应用[J]. 计算机工程与应用, 2021, 57(8): 249-255.
[3]	李莉，纪欣沅，宋嵩. 回环软件缺陷数量预测模型[J]. 计算机工程与应用, 2021, 57(7): 158-163.
[4]	霍光煜，张勇，孙艳丰，尹宝才. 基于语义的档案数据智能分类方法研究[J]. 计算机工程与应用, 2021, 57(6): 247-253.
[5]	杨芳，尹曦，司建辉，刘宏媛，汪雪. 基于侧重点聚类的数学表达式相似度计算方法[J]. 计算机工程与应用, 2021, 57(6): 88-93.
[6]	宗晓萍，陶泽泽. 基于掌握速度的知识追踪模型[J]. 计算机工程与应用, 2021, 57(6): 117-123.
[7]	赵凡，张琳，闻治泉，杨林林，蔺广逢. 一种直接高效的自然场景汉字逼近定位方法[J]. 计算机工程与应用, 2021, 57(6): 159-167.
[8]	彭启慧，宣士斌，高卿. 分布的自动阈值密度峰值聚类算法[J]. 计算机工程与应用, 2021, 57(5): 71-78.
[9]	李勇振，廖湖声. 基于图卷积神经网络的多视角聚类[J]. 计算机工程与应用, 2021, 57(5): 115-122.
[10]	王昌龙，张远东，缪宏，杨煜恒. 双通道卷积神经网络在南瓜病害识别上的应用[J]. 计算机工程与应用, 2021, 57(5): 183-189.
[11]	胡晓敏，王明丰，张首荣，李敏. 用于文本聚类的新型差分进化粒子群算法[J]. 计算机工程与应用, 2021, 57(4): 61-67.
[12]	王俊玲，卢新明. 基于语义相关的视频关键帧提取算法[J]. 计算机工程与应用, 2021, 57(4): 192-198.
[13]	王芙银，张德生，张晓. 结合鲸鱼优化算法的自适应密度峰值聚类算法[J]. 计算机工程与应用, 2021, 57(3): 94-102.
[14]	高天宇，王庆荣，杨磊. 粗糙集属性依赖度强化的应急数据挖掘模型[J]. 计算机工程与应用, 2021, 57(3): 87-93.
[15]	陈俊丰，郑中团. WKMeans与SMOTE结合的不平衡数据过采样方法[J]. 计算机工程与应用, 2021, 57(23): 106-112.

动态滑动窗口的数据流聚类方法

Approach for data streams clustering over dynamic sliding windows

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics