Computer Engineering and Applications ›› 2019, Vol. 55 ›› Issue (8): 27-33.DOI: 10.3778/j.issn.1002-8331.1810-0075

Previous Articles     Next Articles

Improved K-means Clustering k-Value Selection Algorithm

WANG Jianren, MA Xin, DUAN Ganglong   

  1. School of Economics and Management, Xi’an University of Technology, Xi’an 710054, China
  • Online:2019-04-15 Published:2019-04-15

改进的K-means聚类k值选择算法

王建仁,马  鑫,段刚龙   

  1. 西安理工大学 经济与管理学院,西安 710054

Abstract: In spatial clustering algorithms, the effect of clustering depends to a large extent on the choice of the best [k] value. In the typical [K]-means algorithm, the [k] value of clusters needs to be determined in advance, but in actual cases, the value of [k] is difficult to determine. The paper proposes an improved [k]-value selection algorithm, ET-SSE, based on the nature of exponential function, weight adjustment, bias and Elbow Method for the “elbow-point” ambiguity in the process of determining the [k]-value. The algorithm is tested by multiple UCI data sets and [K]-means clustering algorithm. The results show that the [k]-value selection algorithm can determine the value of key more accurately than the Elbow Method.

Key words: K-means algorithm, k-value selection, ET-SSE algorithm

摘要: 空间聚类算法中,聚类的效果在很大程度上受制于最佳[k]值的选择。典型的[K]-均值算法中,聚类数[k]需要事先确定,但在实际情况中[k]的取值很难确定。针对手肘法在确定[k]值的过程中存在的“肘点”位置不明确问题,基于指数函数性质、权重调节、偏执项和手肘法基本思想,提出了一种改进的[k]值选择算法ET-SSE算法。通过多个UCI数据集和[K]-means聚类算法对该算法进行实验,结果表明,使用该[k]值选择算法相比于手肘法能更加快速且准确地确定[k]值。

关键词: K-均值算法, k值选择, ET-SSE算法