Computer Engineering and Applications ›› 2023, Vol. 59 ›› Issue (13): 317-324.DOI: 10.3778/j.issn.1002-8331.2211-0072

• Engineering and Applications • Previous Articles     Next Articles

Granular K-means Clustering Algorithm

ZHOU Chenglong, CHEN Yuming, ZHU Yidong   

  1. College of Computer and Information Engineering, Xiamen University of Technology, Xiamen, Fujian 361024, China
  • Online:2023-07-01 Published:2023-07-01

粒K均值聚类算法

周成龙,陈玉明,朱益冬   

  1. 厦门理工学院 计算机与信息工程学院,福建 厦门 361024

Abstract: K-means clustering belongs to unsupervised learning, which has the advantages of simple application, strong interpretability and a good clustering effect. However, it has slow convergence speed, parameters are difficult to determine, and it is easy to fall into a local solution. In order to overcome the inherent defects of K-means clustering, combined with granular computing theory, this paper proposes a new clustering model: granular K-means clustering algorithm. Samples are granulated into granules on a single feature, and the granules on multi-dimensional features forms a granular vector. Several granular distances are further defined to measure the distance between granules. Then, a granular K-means clustering method based on the granular distance is proposed, and its clustering algorithm is also designed. The granulation aims to compare the similarity in all sample spaces, which reflects the global characteristics of samples and makes the clustering converge easily with fewer iterations. Finally, experiments are carried out on several biological datasets to compare the convergence speed, K-value influence and clustering effect. The results show that the proposed granular K-means clustering method has the advantages of fast convergence speed and a good clustering effect.

Key words: K-means clustering, granular computing, granular distance, unsupervised learning, granular clustering

摘要: K均值聚类属于无监督学习,具有简单易用、可解释性强和聚类效果佳的优点。然而,其算法收敛速度慢,参数难以确定,易陷入局部解。为了克服K均值聚类的固有缺陷,结合粒计算理论,提出了一种新型的聚类模型:粒K均值聚类算法。样本在单特征上粒化成粒子,多特征上的粒子形成粒向量;进一步定义多种粒距离公式,用来度量粒子之间的距离。根据粒距离度量,提出一种粒K均值聚类方法,并设计粒K均值聚类算法。样本粒化是在全部样本空间中进行相似度比较,反映了样本的全局特性,使得聚类收敛迭代次数较少,更容易得到全局最优解。采用多个UCI公开数据集进行实验,从收敛速度、K值影响与聚类效果多方面进行比较,其结果表明所提出的K均值聚类方法具有收敛速度快及聚类效果佳的优点。

关键词: K均值聚类, 粒计算, 粒距离, 无监督学习, 粒聚类