Computer Engineering and Applications ›› 2010, Vol. 46 ›› Issue (6): 124-126.DOI: 10.3778/j.issn.1002-8331.2010.06.035

• 数据库、信号与信息处理 • Previous Articles     Next Articles

New cluster validity function for determining cluster number

PENG Yong1,2,WU You-qing3   

  1. 1.Graduate University of Chinese Academy of Sciences,Beijing 100039,China
    2.Shenyang Institute of Computing Technology,Chinese Academy of Sciences,Shenyang 110171,China
    3.School of Computer Science and Technology,Anhui University,Hefei 230039,China
  • Received:2008-09-02 Revised:2008-11-10 Online:2010-02-21 Published:2010-02-21
  • Contact: PENG Yong

一种新的聚类有效性函数

彭 勇1,2,吴友情3   

  1. 1.中国科学院 研究生院,北京 100039
    2.中国科学院 沈阳计算技术研究所,沈阳 110171
    3.安徽大学 计算机科学与技术学院,合肥 230039
  • 通讯作者: 彭 勇

Abstract: Cluster validity index is used to evaluate the validity of clustering.The clustering result will tend to be more logical on the condition that the initial clustering number is accurately ascertained.According to the basic theory of fuzzy indetermination and the properties of clustering,a new cluster validity function is proposed to identify the optimal cluster number based on the newly introduced index DiUc) that can measure the clustering compactness.Both the geometry structure of dataset and the membership degree are taken into account in the validity function,which based on the properties of clustering compactness and separation.The experimental results indicate that the new validity function can find out the only cluster number if the dataset has the obvious cluster trend and it is also non-sensitive to the weighting coefficient m.

Key words: fuzzy clustering, clustering validity, fuzzy c-means, clustering compactness, clustering separation

摘要: 聚类有效性函数是用于评价聚类结果优劣的指标,准确地给出初始聚类类别数将使得聚类结果趋于合理化。根据模糊不确定性理论及聚类问题的基本特性,引入了新的紧密度度量指标DiUc),在此基础上提出了一个旨在寻求最优聚类类别数的有效性函数。该函数基于数据集的紧密度与分离度特征,综合考虑了数据成员的隶属度及数据集的几何结构。实验结果表明该有效性函数能够发现最优的聚类类别数,对于分类结构较为明确的数据集表现出良好的性能,并且对于权重系数具有良好的鲁棒性。

关键词: 模糊聚类, 聚类有效性, 模糊c均值, 聚类紧密度, 聚类分离度

CLC Number: