计算机工程与应用 ›› 2008, Vol. 44 ›› Issue (3): 189-191.

• 数据库与信息处理 • 上一篇    下一篇

基于均衡化函数的快速K-means算法

施培蓓1,钱雪忠1,汪 中2   

  1. 1.江南大学 信息工程学院,江苏 无锡 214122
    2.中国科技大学 计算机科学与技术系,合肥 230027
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2008-01-21 发布日期:2008-01-21
  • 通讯作者: 施培蓓

Algorithm for fast K-means based on balanced function

SHI Pei-bei1,QIAN Xue-zhong1,WANG Zhong2   

  1. 1.School of Information Engineering,Southern Yangtze University,Wuxi,Jiangsu 214122,China
    2.Department of Computer Science,University of Science and Technology of China,Hefei 230027,China
  • Received:1900-01-01 Revised:1900-01-01 Online:2008-01-21 Published:2008-01-21
  • Contact: SHI Pei-bei

摘要: 聚类分析的应用很广泛,传统的K-means算法要求事先给定k值,限制了很多实际的应用,由于聚类的质量主要考察类内的紧凑性和类间的距离,提出了均衡化的评价函数,使用最近邻搜索算法减少算法的计算量,不仅自动生成聚类的数目,同时均衡了类内差异和类间差异对于聚类结果的影响,实验结果证明改进的K-means算法的有效性。

关键词: K-均值算法, 类内差异, 类间差异, 均衡化函数, 扩展的部分失真搜索

Abstract: Clustering analysis is widely used.In the usage of traditional K-means algorithm,value K must be confirmed in advance.This demand restricts a large number of practical applications.The quality of clustering mainly depends on the compaction and the distance among clusters.This paper presents a balanced evaluative function and it uses a nearest neighborhood search algorithm to reduce the amount of computation.The algorithm not only generates the number of clusters automatically,at the same time the affections on the clustering result coming from the differences between and within the clusters are well balanced.Results of the experiment prove the efficiency of the improved K-means algorithm.

Key words: K-means algorithm, within cluster variation, between cluster variation, balanced function, EPDS