一种基于信息熵的子空间聚类算法

计算机工程与应用 ›› 2012, Vol. 48 ›› Issue (12): 139-143.

• 数据库、信号与信息处理 • 上一篇下一篇

一种基于信息熵的子空间聚类算法

刘竞杰1，陶亮2

1.安徽工贸职业技术学院计算机技术系，安徽淮南 232007
2.安徽大学计算机科学与技术学院，合肥 230601

出版日期:2012-04-21 发布日期:2012-04-20

Information entropy based subspace clustering algorithm

LIU Jingjie1, TAO Liang2

1.Department of Computer Technology, Anhui Vocational and Technical College of Industry and Trade, Huainan, Anhui 232007, China
2.School of Computer Science and Technology, Anhui University, Hefei 230601, China

Online:2012-04-21 Published:2012-04-20

摘要/Abstract

摘要： 结合传统的Parzen窗方法并引入一种更加合理的历史数据丢弃策略，在此基础上，通过计算可以得到整个数据集在低维空间投影的信息熵，利用信息熵实现了一种适用于高维数据流的子空间聚类算法（PStream）。理论及实验均表明，与传统的算法相比，该算法可以在一次遍历的前提下，完成对数据流的高精度聚类，虽然其运行效率与现有的方法（如HPStream）相比差别不大，但是却明显地改善了聚类效果。

关键词: 数据流, 聚类, 高维, 子空间, 数据挖掘

Abstract: A new method for estimating probability density of data distribution on data streams which is a more reasonable strategy for fading the old data is proposed. Based on Parzen method, the information entropy of the subspace of the data set can be calculated. Based on the close relationship between entropy and distribution, an effective algorithm based on entropy for clustering high dimensional data streams called PStream is also developed. The theoretical and simulation results show that compared with the previous results, PStream algorithm scans over the data stream in only a single pass and has a high clustering precision although it is not much more efficient than the previous method such as HPStream.

Key words: data streams, clustering, high dimension, subspace, data mining

刘竞杰1，陶亮2. 一种基于信息熵的子空间聚类算法[J]. 计算机工程与应用, 2012, 48(12): 139-143.

LIU Jingjie1, TAO Liang2. Information entropy based subspace clustering algorithm[J]. Computer Engineering and Applications, 2012, 48(12): 139-143.

[1]	兰红，黄敏. 融合KNN优化的密度峰值和FCM聚类算法[J]. 计算机工程与应用, 2021, 57(9): 81-88.
[2]	桑江徽，姜海燕. 基于联合分布的多标记迁移学习[J]. 计算机工程与应用, 2021, 57(9): 154-161.
[3]	郭晓静，隋昊达. 改进YOLOv3在机场跑道异物目标检测中的应用[J]. 计算机工程与应用, 2021, 57(8): 249-255.
[4]	李莉，纪欣沅，宋嵩. 回环软件缺陷数量预测模型[J]. 计算机工程与应用, 2021, 57(7): 158-163.
[5]	霍光煜，张勇，孙艳丰，尹宝才. 基于语义的档案数据智能分类方法研究[J]. 计算机工程与应用, 2021, 57(6): 247-253.
[6]	杨芳，尹曦，司建辉，刘宏媛，汪雪. 基于侧重点聚类的数学表达式相似度计算方法[J]. 计算机工程与应用, 2021, 57(6): 88-93.
[7]	宗晓萍，陶泽泽. 基于掌握速度的知识追踪模型[J]. 计算机工程与应用, 2021, 57(6): 117-123.
[8]	赵凡，张琳，闻治泉，杨林林，蔺广逢. 一种直接高效的自然场景汉字逼近定位方法[J]. 计算机工程与应用, 2021, 57(6): 159-167.
[9]	彭启慧，宣士斌，高卿. 分布的自动阈值密度峰值聚类算法[J]. 计算机工程与应用, 2021, 57(5): 71-78.
[10]	李勇振，廖湖声. 基于图卷积神经网络的多视角聚类[J]. 计算机工程与应用, 2021, 57(5): 115-122.
[11]	王昌龙，张远东，缪宏，杨煜恒. 双通道卷积神经网络在南瓜病害识别上的应用[J]. 计算机工程与应用, 2021, 57(5): 183-189.
[12]	胡晓敏，王明丰，张首荣，李敏. 用于文本聚类的新型差分进化粒子群算法[J]. 计算机工程与应用, 2021, 57(4): 61-67.
[13]	王俊玲，卢新明. 基于语义相关的视频关键帧提取算法[J]. 计算机工程与应用, 2021, 57(4): 192-198.
[14]	王芙银，张德生，张晓. 结合鲸鱼优化算法的自适应密度峰值聚类算法[J]. 计算机工程与应用, 2021, 57(3): 94-102.
[15]	石正宇，陈仁文，黄斌. 基于自归一化神经网络的低分辨率人脸识别[J]. 计算机工程与应用, 2021, 57(3): 137-143.

一种基于信息熵的子空间聚类算法

Information entropy based subspace clustering algorithm

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics