Granular K-means Clustering Algorithm

doi:10.3778/j.issn.1002-8331.2211-0072

Abstract

Abstract: K-means clustering belongs to unsupervised learning, which has the advantages of simple application, strong interpretability and a good clustering effect. However, it has slow convergence speed, parameters are difficult to determine, and it is easy to fall into a local solution. In order to overcome the inherent defects of K-means clustering, combined with granular computing theory, this paper proposes a new clustering model: granular K-means clustering algorithm. Samples are granulated into granules on a single feature, and the granules on multi-dimensional features forms a granular vector. Several granular distances are further defined to measure the distance between granules. Then, a granular K-means clustering method based on the granular distance is proposed, and its clustering algorithm is also designed. The granulation aims to compare the similarity in all sample spaces, which reflects the global characteristics of samples and makes the clustering converge easily with fewer iterations. Finally, experiments are carried out on several biological datasets to compare the convergence speed, K-value influence and clustering effect. The results show that the proposed granular K-means clustering method has the advantages of fast convergence speed and a good clustering effect.

Key words: K-means clustering, granular computing, granular distance, unsupervised learning, granular clustering

摘要： K均值聚类属于无监督学习，具有简单易用、可解释性强和聚类效果佳的优点。然而，其算法收敛速度慢，参数难以确定，易陷入局部解。为了克服K均值聚类的固有缺陷，结合粒计算理论，提出了一种新型的聚类模型：粒K均值聚类算法。样本在单特征上粒化成粒子，多特征上的粒子形成粒向量；进一步定义多种粒距离公式，用来度量粒子之间的距离。根据粒距离度量，提出一种粒K均值聚类方法，并设计粒K均值聚类算法。样本粒化是在全部样本空间中进行相似度比较，反映了样本的全局特性，使得聚类收敛迭代次数较少，更容易得到全局最优解。采用多个UCI公开数据集进行实验，从收敛速度、K值影响与聚类效果多方面进行比较，其结果表明所提出的K均值聚类方法具有收敛速度快及聚类效果佳的优点。

关键词: K均值聚类, 粒计算, 粒距离, 无监督学习, 粒聚类

ZHOU Chenglong, CHEN Yuming, ZHU Yidong. Granular K-means Clustering Algorithm[J]. Computer Engineering and Applications, 2023, 59(13): 317-324.

周成龙, 陈玉明, 朱益冬. 粒K均值聚类算法[J]. 计算机工程与应用, 2023, 59(13): 317-324.

References

[1] ZADEH L A.Fuzzy?sets and information granularity[C]//Advances in?Fuzzy?Set Theory and Applications，1979：3-18.
[2] LIN T Y.Granular computing on binary relations I：data mining and neighborhood systems[J].Rough Sets in Knowledge Discovery，1998（2）：165-166.
[3] YAO Y Y.Granular computing using neighborhood systems[C]//Advances in Soft Computing，1999：539-553.
[4] 刘清，黄兆华.G逻辑及其归结推理[J].计算机学报，2004，27（7）：865-872.
LIU Q，HUANG Y H.G-logic ans its reasoning[J].Chinese Journal of Computers，2004，27（7）：865-872.
[5] 苗夺谦，范世栋.知识的粒度计算及其应用[J].系统工程理论与实践，2002，22（1）：48-56.
MIAO D Q，FAN S D.The calculation of knowledge granulation and its application[J].System Engineering Theory and Practice，2002，22（1）：48-56.
[6] ZHANG Y Q，FRASER M D，GACLIANO R A.Grarular neural networks for numericat linguistic data fusion and know ledge discovery[J].IEEE Transactions on Neural Networks，2000（11）：658-667.
[7] 梁吉业，冯晨娇，宋鹏.大数据相关分析综述[J].计算机学报，2016，39（1）：1-18.
LIANG J Y，FENG C J，SONG P.Overview of big data correlation analysis[J].Chinese Journal of Computers，2016，39（1）：1-18.
[8] 朱凡，王印琪.基于k-means与神经网络机器学习算法的用户信息聚类及预测研究[J].情报科学，2021，39（7）：83-90.
ZHU F，WANG Y Q.Research on user information clustering and prediction based on k-means and neural network machine learning algorithm[J].Information Science，2021，39（7）：83-90.
[9] 常思源，白晓征，刘君.一种基于聚类分析的二维激波模式识别算法[J].航空学报，2020，41（8）：162-175.
CHANG S Y，BAI X Z，XU J.A two-dimensional shock wave pattern recognition algorithm based on cluster analysis[J].Acta Aeronautica ET Astronautica Sinica，2020，41（8）：162-175.
[10] 王海龙，柳林，林民，等.基于信息检索及k均值聚类的音乐个性化推荐算法[J].吉林大学学报（工学版），2021，51（5）：1845-1850.
WANG H L，LIU L，LIN M，et al.Music personalized recommendation algorithm based on k-means clustering algorithm[J].Journal of Jilin University（Engineering and Technology Edition），2021，51（5）：1845-1850.
[11] 张皓，吴建鑫.基于深度特征的无监督图像检索研究综述[J].计算机研究与发展，2018，55（9）：1829-1842.
ZHANG H，WU J L.A survey on unsupervised image retrieval using deep features[J].Journal of Computer Research and Development，2018，55（9）：1829-1842.
[12] BEZDEK J C.Pattern recognition with fuzzy objective function algorithms[M].New York：Plenum Press，1981.
[13] MACQUEEN J B.Some methods for classification and analysis of multivariate observations[C]//Proceedings of Berkeley Symp on Mathematical Statistics and Probability，1967：281-297.
[14] JOHNSON S C.Hierarchical clustering schemes[J].Psychometrika，1967，32（2）：241-254.
[15] KAUFMAN L，ROUSSEEUW P J.Finding groups in data：an introduction to cluster analysis[M].New York：John Wiley & Sons，1990.
[16] ZHANG T，RAMAKRISHNAN R，LIVNY M.BIRCH：an efficient data clustering method for very large databases[C]//Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data，1996：103-114.
[17] GUHA S，RASTOGI R，SHIM K.CURE：an efficient clustering algorithm for clustering large databases[C]//Proceedings of the ACM SIGMOD International Conference on Management of Data，1998：73-84.
[18] GUHA S，RASTOGI R，SHIM K.ROCK：a robust clustering algorithm for categorical attributes[J].Information Systems，1999，25（5）：512-521.
[19] KARYPIS G，HAN E H，KUMAR V.CHAMELEON：a hierarchical clustering algorithm using dynamic modeling[J].Computer，1999，32（8）：68-75.
[20] ESTER M，KRIEGEL H P，SANDER J，et al.A density-based algorithm for discovering clusters in large spatial data sets with noise[C]//Proceedings of the International Conference on Knowledge Discovery and Data Mining，1996：226-231.
[21] WANG W，YANG J，MUNTZ R R.STING：a statistical information grid approach to spatial data mining[C]// Proceedings of International Conference on Very Large Data Bases，1997：186-195.
[22] FISHER D.Improving inference through conceptual clustering[C]//Proceedings of National Conference on Artificial Intelligence，1987：461-465.
[23] GENNARI J H，LANGLEY P，FISHER D.Models of incremental concept formation[J].Artificial Intelligence，1989，40（1/3）：11-61.
[24] CHEESEMAN P，STUTZ J.Bayesian classification（auto class）：theory and results[C]//Advances in Knowledge Discovery & Data Mining，1997：153-180.
[25] 徐计，王国胤，于洪.基于粒计算的大数据处理[J].计算机学报，2015，38（8）：1497-1517.
XU J，WANG G Y，YU H，Review of big data processing based on granular computing[J].Chinese Journal of Computers，2015，38（8）：1497-1517.