Computer Engineering and Applications ›› 2012, Vol. 48 ›› Issue (11): 129-132.

Previous Articles     Next Articles

Research on heuristic initialization-independent k-means algorithm

WANG Huiqing, CHEN Junjie, GUO Kai   

  1. College of Computer Science and Technology, Taiyuan University of Technology, Taiyuan 030024, China
  • Online:2012-04-11 Published:2012-04-16


王会青,陈俊杰,郭  凯   

  1. 太原理工大学 计算机科学与技术学院,太原 030024

Abstract: According to the initialization sensitivity problem of the traditional k-means algorithm, a heuristic initialization-
independent k-means algorithm is proposed. Prim algorithm is introduced to solve the selection of initial clustering centers, and the threshold parameter θ is set, which can avoid several data objects from the same class as the initial clustering centers simultaneously, otherwise the algorithm increases the iteration times, and the wrong clustering results are got. Compared with the traditional k-means algorithm and k-means clustering analysis based on genetic algorithm, the experimental result shows that the improved algorithm not only reduces the impact of random selection of initial clustering centers, and decreases the iteration times effectively, but also reduces the affect of outliers in the process of clustering, which validates the feasibility and effectiveness of the suggested algorithm.

Key words: clustering analysis, k-means clustering, prim algorithm, initialization sensitivity, clustering center

摘要: 针对传统k-均值算法对初始聚类中心敏感的问题,提出了启发式初始化独立的k-均值算法。该算法引入prim算法选择k个初始聚类中心,且通过设置阈值参数θ,避免同一类中的多个数据对象同时作为初始聚类中心,否则将导致聚类迭代次数增加,并得到错误的聚类结果。与传统的k-均值算法和基于遗传算法的k-均值聚类算法相比,实验结果表明改进的算法不仅降低了初始聚类中心选取的随机性对聚类性能产生的影响,有效减少了聚类迭代次数,而且降低了离群点对聚类性能的影响,从而验证了算法的可行性和有效性。

关键词: 聚类分析, k-均值算法, prim算法, 初始化敏感, 聚类中心