计算机工程与应用 ›› 2015, Vol. 51 ›› Issue (20): 120-125.

• 数据库、数据挖掘、机器学习 • 上一篇    下一篇

基于方向约束的对称距离聚类算法

陈强业1,2,李际军1   

  1. 1.浙江大学 计算机科学与技术学院,杭州 310027
    2.浙江大学城市学院 信息化办公室,杭州 310015
  • 出版日期:2015-10-15 发布日期:2015-10-30

Clustering algorithm based on symmetry distance with direction constraint

CHEN Qiangye1,2, LI Jijun1   

  1. 1.College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China
    2.Informatization Office, Zhejiang University City College, Hangzhou 310015, China
  • Online:2015-10-15 Published:2015-10-30

摘要: K-means算法是数据挖掘领域研究、应用都非常广泛的一种聚类算法,其各种衍生算法很多,其中包括近年出现的以点对称距离为测度的K-means聚类算法。在点对称距离聚类算法的基础上提出一种新的聚类算法,根据对对称性的分析,为对称性的描述增加方向约束,提高对称距离的描述准确性,以此来提高聚类的准确性。同时,针对对称点成对出现的特点,调整了聚类过程中的收敛策略,以对称点对连线中点计算聚类中心,改善了基于对称距离的聚类算法收敛性能。通过数值仿真比较了所提算法与原有算法的优劣,结果显示该算法在计算复杂度不变的条件下获得了更准确的结果,聚类结果更接近数据的真实分类。

关键词: K-means算法, 聚类, 对称距离, 方向约束

Abstract: K-means is a well studied and widely used clustering algorithm in data mining. There are many clustering algorithms evolved from K-means. For example, the symmetry-based version of the K-means algorithm using the point symmetry distance as the similarity measure is proposed at recent years. In this paper, a new clustering algorithm based on point symmetry distance clustering algorithm is proposed. The direction constraint is put forward after studying the pro-perties of symmetry to enhance the description of symmetric distance and improve the accuracy of clustering. For the fact that symmetry is the relationship between two points, the strategy of convergence is modified to use the midpoint of the symmetry pair to calculate the cluster centers. The convergence performance of clustering is improved. By numerical simulation it shows that the proposed algorithm reaches a more accurate result with the same computational complexity as the existing one.

Key words: K-means algorithm, clustering, symmetry distance, direction constraint