Information entropy based subspace clustering algorithm

Computer Engineering and Applications ›› 2012, Vol. 48 ›› Issue (12): 139-143.

Previous Articles Next Articles

Information entropy based subspace clustering algorithm

LIU Jingjie1, TAO Liang2

1.Department of Computer Technology, Anhui Vocational and Technical College of Industry and Trade, Huainan, Anhui 232007, China
2.School of Computer Science and Technology, Anhui University, Hefei 230601, China

Online:2012-04-21 Published:2012-04-20

一种基于信息熵的子空间聚类算法

刘竞杰1，陶亮2

1.安徽工贸职业技术学院计算机技术系，安徽淮南 232007
2.安徽大学计算机科学与技术学院，合肥 230601

Abstract

Abstract: A new method for estimating probability density of data distribution on data streams which is a more reasonable strategy for fading the old data is proposed. Based on Parzen method, the information entropy of the subspace of the data set can be calculated. Based on the close relationship between entropy and distribution, an effective algorithm based on entropy for clustering high dimensional data streams called PStream is also developed. The theoretical and simulation results show that compared with the previous results, PStream algorithm scans over the data stream in only a single pass and has a high clustering precision although it is not much more efficient than the previous method such as HPStream.

Key words: data streams, clustering, high dimension, subspace, data mining

摘要： 结合传统的Parzen窗方法并引入一种更加合理的历史数据丢弃策略，在此基础上，通过计算可以得到整个数据集在低维空间投影的信息熵，利用信息熵实现了一种适用于高维数据流的子空间聚类算法（PStream）。理论及实验均表明，与传统的算法相比，该算法可以在一次遍历的前提下，完成对数据流的高精度聚类，虽然其运行效率与现有的方法（如HPStream）相比差别不大，但是却明显地改善了聚类效果。

关键词: 数据流, 聚类, 高维, 子空间, 数据挖掘

LIU Jingjie1, TAO Liang2. Information entropy based subspace clustering algorithm[J]. Computer Engineering and Applications, 2012, 48(12): 139-143.

刘竞杰1，陶亮2. 一种基于信息熵的子空间聚类算法[J]. 计算机工程与应用, 2012, 48(12): 139-143.

[1]	SANG Jianghui, JIANG Haiyan. Multi-label Transfer Learning Algorithm Based on Joint Distribution Alignment [J]. Computer Engineering and Applications, 2021, 57(9): 154-161.
[2]	LAN Hong, HUANG Min. Fusion of KNN Optimized Density Peaks and FCM Clustering Algorithm [J]. Computer Engineering and Applications, 2021, 57(9): 81-88.
[3]	GUO Xiaojing, SUI Haoda. Application of Improved YOLOv3 in Foreign Object Debris Target Detection on Airfield Pavement [J]. Computer Engineering and Applications, 2021, 57(8): 249-255.
[4]	LI Li, JI Xinyuan, SONG Song. Prediction Model for Number of Software Defects in Loop [J]. Computer Engineering and Applications, 2021, 57(7): 158-163.
[5]	HUO Guangyu, ZHANG Yong, SUN Yanfeng, YIN Baocai. Research on Archive Data Intelligent Classification Based on Semantic [J]. Computer Engineering and Applications, 2021, 57(6): 247-253.
[6]	YANG Fang, YIN Xi, SI Jianhui, LIU Hongyuan, WANG Xue. Mathematical Expression Similarity Calculation Method Based on Focus Clustering [J]. Computer Engineering and Applications, 2021, 57(6): 88-93.
[7]	ZONG Xiaoping, TAO Zeze. Knowledge Tracing Model Based on Mastery Speed [J]. Computer Engineering and Applications, 2021, 57(6): 117-123.
[8]	ZHAO Fan, ZHANG Lin, WEN Zhiquan, YANG Linlin, LIN Guangfeng. Direct and Efficient Natural Scene Chinese Character Approaching Spotting Method [J]. Computer Engineering and Applications, 2021, 57(6): 159-167.
[9]	PENG Qihui, XUAN Shibin, GAO Qing. Distribution Automatic Threshold Density Peak Clustering Algorithm [J]. Computer Engineering and Applications, 2021, 57(5): 71-78.
[10]	LI Yongzhen, LIAO Husheng. Multi-view Clustering via Graph Convolutional Neural Network [J]. Computer Engineering and Applications, 2021, 57(5): 115-122.
[11]	WANG Changlong, ZHANG Yuandong, MIAO Hong, YANG Yuheng. Application of Double Channel Convolutional Neural Network in Pumpkin Diseases Identification [J]. Computer Engineering and Applications, 2021, 57(5): 183-189.
[12]	HU Xiaomin, WANG Mingfeng, ZHANG Shourong, LI Min. New Differential Evolution with Particle Swarm Optimization Algorithm for Text Clustering [J]. Computer Engineering and Applications, 2021, 57(4): 61-67.
[13]	WANG Junling, LU Xinming. Video Key Frame Extraction Algorithm Based on Semantic Correlation [J]. Computer Engineering and Applications, 2021, 57(4): 192-198.
[14]	GAO Tianyu, WANG Qingrong, YANG Lei. Data Mining Model Based on Attribute Dependability Enhancement of Rough Set [J]. Computer Engineering and Applications, 2021, 57(3): 87-93.
[15]	WANG Fuyin, ZHANG Desheng, ZHANG Xiao. Adaptive Density Peaks Clustering Algorithm Combining with Whale Optimization Algorithm [J]. Computer Engineering and Applications, 2021, 57(3): 94-102.

Information entropy based subspace clustering algorithm

一种基于信息熵的子空间聚类算法

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics