Computer Engineering and Applications ›› 2009, Vol. 45 ›› Issue (3): 47-50.DOI: 10.3778/j.issn.1002-8331.2009.03.013

• 研究、探讨 • Previous Articles     Next Articles

NeiMu:novel clustering algorithm based on graph theory

YING De-quan1,YING Xiao-min2,YE Ji-hua1   

  1. 1.College of Computer Information Engineering,Jiangxi Normal University,Nanchang 330022,China
    2.Institute of Basic Medical Sciences,Academy of Military Medical Sciences,Beijing 100850,China
  • Received:2008-07-29 Revised:2008-09-27 Online:2009-01-21 Published:2009-01-21
  • Contact: YING De-quan

一种基于图论的聚类算法NeiMu

应德全1,应晓敏2,叶继华1   

  1. 1.江西师范大学 计算机信息工程学院,南昌 330022
    2.解放军军事医学科学院 基础医学研究所,北京 100850
  • 通讯作者: 应德全

Abstract: A novel clustering algorithm based on graph theory named NeiMu is proposed.NeiMu first analyzes all the objects in data,searching k-nearest neighbors for each object and constructing directed graph of k-nearest neighbors.Then it constructs k-clustering graph according to the k-mutual neighbor in directed graph of k-nearest neighbors.Finally,it discovers natural clusters in data on the basis of k-clustering graph.The significant characteristic of NeiMu is the ability of determining natural clusters by mutual neighbor relations in data,instead of introducing other partition methods to divide data into small clusters.As a consequence,NeiMu guarantees that all the objects will not be clustered erroneously but be merged into bigger clusters with other small clusters.This advantage will ensure the clustering quality of NeiMu algorithm.Furthermore,NeiMu,which presents results similar to bottom-up hierarchical clustering algorithm,facilitates users to determine the best k according to the gradually merging results.The experiments show that NeiMu exhibits good clustering quality for data with variable densities,variable sizes and arbitrary shapes.It is also robust with outliers.

Key words: graph theory, clustering, k-neighbor

摘要: 提出一种新的基于图论的聚类算法NeiMu。该算法首先分析数据中的对象,寻找每个对象的k近邻,根据k近邻关系构造k近邻有向图,然后通过k近邻有向图中的k-互邻居关系构造k-聚类图,发现数据中的自然聚类。算法的特点是根据数据之间的互为k近邻关系确定数据中的自然簇,而不必引入其他方法来划分小簇,从而能够保证对象不会被错误聚类,仅会与其他小簇一起融合到一个大簇中。这一优点可以有效保证NeiMu算法的聚类质量。而且,NeiMu算法给出的这种类似自底向上的层次聚类结果还有利于用户根据渐变的结果确定最佳的k值。实验结果表明,该算法对密度变化大的数据、大小相差大的数据、任意分布形状的数据均具有很好的聚类质量,对孤立点也很健壮。

关键词: 图论, 聚类, k近邻