计算机工程与应用 ›› 2012, Vol. 48 ›› Issue (35): 114-117.

• 数据库、信号与信息处理 • 上一篇    下一篇

一种融合变异系数的k-mean聚类分析方法

范阿琳,任树华   

  1. 大连工业大学 信息科学与工程学院,辽宁 大连 116034
  • 出版日期:2012-12-11 发布日期:2012-12-21

K-means clustering algorithm based on coefficient of variation

FAN Alin, REN Shuhua   

  1. School of Information Science and Engineering, Dalian Polytechnic University, Dalian, Liaoning 116034, China
  • Online:2012-12-11 Published:2012-12-21

摘要: K-means聚类算法的性能依赖于距离度量的选择,k-means算法将欧几里德距离作为最常用的距离度量方法。欧氏距离认为所有属性在聚类中作用是相同的,但是这种距离度量方法并不能准确反映样本间的相异性。针对这种不足,提出了融合变异系数的k-means聚类分析方法(CV-k-means),利用变异系数权重向量来减少不相关属性的影响。实验结果表明,该方法的聚类结果优于k-means算法。

关键词: k-means 算法, 相异性度量, 权, 变异系数

Abstract: The performance of k-means clustering algorithm depends on the selection of distance metrics. The Euclid distance is commonly chosen as the similarity measure in k-means clustering algorithm, which treats all features equally and does not accurately reflect the dissimilarity among samples. K-means clustering algorithm based on Coefficient of Variation(CV-k-means) is proposed in this paper to solve this problem. The CV-k-means clustering algorithm uses variation coefficient weight vector to decrease the affects of irrelevant features. The experimental results show that the proposed algorithm can generate better clustering results than k-means algorithm.

Key words: k-means clustering, dissimilarity measure, weighting, coefficient of variation