Computer Engineering and Applications ›› 2020, Vol. 56 ›› Issue (23): 87-94.DOI: 10.3778/j.issn.1002-8331.2009-0103

Previous Articles     Next Articles

Improved K-means Algorithm Based on Distance and Weight

WANG Zilong, LI Jin, SONG Yafei   

  1. 1.Graduate College, Air Force Engineering University, Xi’an 710051, China
    2.School of Air and Missile Defense, Air Force Engineering University, Xi’an 710051, China
  • Online:2020-12-01 Published:2020-11-30



  1. 1.空军工程大学 研究生院,西安 710051 
    2.空军工程大学 防空反导学院,西安 710051


K-means clustering algorithm is simple, efficient and widely used. The randomness of the selection of the initial clustering center of the traditional K-means algorithm leads to the problem that the algorithm is easy to fall into the local optimal and the K value needs to be determined manually. In order to obtain the most appropriate initial clustering center, an improved K-means algorithm based on distance and sample weight is proposed. This clustering algorithm uses dimensionally-weighted Euclidean distance to measure the distance between sample points, after calculating the density and weight of all samples, the point with the highest density is used as the first initial cluster center, and all samples within the cluster are eliminated, then, according to the last cluster center and the weights of the remaining sample points in the data set, the next initial cluster center is found through the introduced parameter [τi], this process is repeated until the data set is empty, finally [k] initial cluster centers are automatically obtained. The experiments are carried out on the UCI data set. Compared with the classical K-means algorithm, WK-means algorithm, ZK-means algorithm and DCK-means algorithm, the improved K-means algorithm based on distance and weight has better clustering effect.

Key words: data mining, K-means algorithm, initial cluster center, weighted Euclidean distance, weight product



关键词: 数据挖掘, K-means算法, 初始聚类中心, 加权欧式距离, 权重