Density Peak Improvement Algorithm for Clustering Hybrid Data

doi:10.3778/j.issn.1002-8331.1905-0357

Computer Engineering and Applications ›› 2020, Vol. 56 ›› Issue (12): 47-53.DOI: 10.3778/j.issn.1002-8331.1905-0357

Previous Articles Next Articles

Density Peak Improvement Algorithm for Clustering Hybrid Data

TAN Yang, TANG Dequan, CAO Shoufu

1.College of Mathematics and Statistics, Hunan Normal University, Changsha 410081, China
2.Department of Network Technology, Hunan Radio and Television University, Changsha 410004, China
3.Department of Information Technology, Hunan Police Academy, Changsha 410138, China

Online:2020-06-15 Published:2020-06-09

聚类混合型数据的密度峰值改进算法

谭阳，唐德权，曹守富

1.湖南师范大学数学与统计学院，长沙 410081
2.湖南广播电视大学网络技术系，长沙 410004
3.湖南警察学院信息技术系，长沙 410138

Abstract

Abstract:

Clustering mixed data is usually evaluated according to the difference of sample attribute categories. However, this way of dividing the sample attributes into different subspaces separately separates the original unity of the sample attributes, and leads to the non-consistent metric deviation for the similarity evaluation of the sample individual. Concerning this issue, a new clustering algorithm based on binary coded sample attributes is proposed, and then unified metrics for attribute coding are carried out by Hamming’s difference. The new algorithm avoids the cutting of sample attributes by performing similarity measures on mixed data within a unified framework. Based on this, it also assigns different weights based on the properties of different attributes and to evaluate the similarity between the samples. The experimental results show that the new algorithm can effectively cluster mixed data, and compared with other existing clustering algorithms, it shows better clustering accuracy and stability.

Key words: clustering, hybrid data, density peak, attribute coding, Hamming metric

摘要：

聚类混合型数据，通常是依据样本属性类别的不同分别进行评价。但这种将样本属性划分到不同子空间中分别度量的方式，割裂了样本属性原有的统一性；导致对样本个体的相似性评价产生了非一致的度量偏差。针对这一问题，提出以二进制编码样本属性，再由海明差异对属性编码施行统一度量的新的聚类算法。新算法通过在统一的框架内对混合型数据实施相似性度量，避免了对样本属性的切割，在此基础上又根据不同属性的性质赋予其不同的权重，并以此评价样本个体之间的相似程度。实验结果表明，新算法能够有效地聚类混合型数据；与已有的其他聚类算法相比较，表现出更好的聚类准确率及稳定性。

关键词: 聚类, 混合型数据, 密度峰值, 属性编码, 海明度量

TAN Yang, TANG Dequan, CAO Shoufu. Density Peak Improvement Algorithm for Clustering Hybrid Data[J]. Computer Engineering and Applications, 2020, 56(12): 47-53.

谭阳，唐德权，曹守富. 聚类混合型数据的密度峰值改进算法[J]. 计算机工程与应用, 2020, 56(12): 47-53.

[1]	LAN Hong, HUANG Min. Fusion of KNN Optimized Density Peaks and FCM Clustering Algorithm [J]. Computer Engineering and Applications, 2021, 57(9): 81-88.
[2]	GUO Xiaojing, SUI Haoda. Application of Improved YOLOv3 in Foreign Object Debris Target Detection on Airfield Pavement [J]. Computer Engineering and Applications, 2021, 57(8): 249-255.
[3]	LI Li, JI Xinyuan, SONG Song. Prediction Model for Number of Software Defects in Loop [J]. Computer Engineering and Applications, 2021, 57(7): 158-163.
[4]	HUO Guangyu, ZHANG Yong, SUN Yanfeng, YIN Baocai. Research on Archive Data Intelligent Classification Based on Semantic [J]. Computer Engineering and Applications, 2021, 57(6): 247-253.
[5]	YANG Fang, YIN Xi, SI Jianhui, LIU Hongyuan, WANG Xue. Mathematical Expression Similarity Calculation Method Based on Focus Clustering [J]. Computer Engineering and Applications, 2021, 57(6): 88-93.
[6]	ZHAO Fan, ZHANG Lin, WEN Zhiquan, YANG Linlin, LIN Guangfeng. Direct and Efficient Natural Scene Chinese Character Approaching Spotting Method [J]. Computer Engineering and Applications, 2021, 57(6): 159-167.
[7]	PENG Qihui, XUAN Shibin, GAO Qing. Distribution Automatic Threshold Density Peak Clustering Algorithm [J]. Computer Engineering and Applications, 2021, 57(5): 71-78.
[8]	LI Yongzhen, LIAO Husheng. Multi-view Clustering via Graph Convolutional Neural Network [J]. Computer Engineering and Applications, 2021, 57(5): 115-122.
[9]	WANG Changlong, ZHANG Yuandong, MIAO Hong, YANG Yuheng. Application of Double Channel Convolutional Neural Network in Pumpkin Diseases Identification [J]. Computer Engineering and Applications, 2021, 57(5): 183-189.
[10]	HU Xiaomin, WANG Mingfeng, ZHANG Shourong, LI Min. New Differential Evolution with Particle Swarm Optimization Algorithm for Text Clustering [J]. Computer Engineering and Applications, 2021, 57(4): 61-67.
[11]	WANG Junling, LU Xinming. Video Key Frame Extraction Algorithm Based on Semantic Correlation [J]. Computer Engineering and Applications, 2021, 57(4): 192-198.
[12]	WANG Fuyin, ZHANG Desheng, ZHANG Xiao. Adaptive Density Peaks Clustering Algorithm Combining with Whale Optimization Algorithm [J]. Computer Engineering and Applications, 2021, 57(3): 94-102.
[13]	CHEN Junfeng, ZHENG Zhongtuan. Over-Sampling Method on Imbalanced Data Based on WKMeans and SMOTE [J]. Computer Engineering and Applications, 2021, 57(23): 106-112.
[14]	ZHANG Zhonglin, ZHAO Yu, YAN Guanghui. Natural Neighbor Density Extremum Clustering Algorithm [J]. Computer Engineering and Applications, 2021, 57(23): 200-210.
[15]	MEI Jie, WEI Yuanyuan, XU Taosheng. Fusion Clustering Algorithm Based on Multi-Prototypes Using Density Peaks [J]. Computer Engineering and Applications, 2021, 57(22): 78-85.

Density Peak Improvement Algorithm for Clustering Hybrid Data

聚类混合型数据的密度峰值改进算法

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics