计算机工程与应用 ›› 2010, Vol. 46 ›› Issue (25): 142-145.DOI: 10.3778/j.issn.1002-8331.2010.25.042

• 数据库、信号与信息处理 • 上一篇    下一篇

改进的基于遗传算法的粗糙聚类方法

洪亮亮,罗 可   

  1. 长沙理工大学 计算机与通信工程学院,长沙 410014
  • 收稿日期:2009-10-23 修回日期:2009-12-14 出版日期:2010-09-01 发布日期:2010-09-01
  • 通讯作者: 洪亮亮

Improved rough clustering method based on genetic algorithm

HONG Liang-liang,LUO Ke   

  1. Institute of Computer and Communication Engineering,Changsha University of Sciences and Technology,Changsha 410014,China
  • Received:2009-10-23 Revised:2009-12-14 Online:2010-09-01 Published:2010-09-01
  • Contact: HONG Liang-liang

摘要: 传统的聚类算法都是使用硬计算来对数据对象进行划分,然而现实中不同类之间对象通常没有明确的界限。粗糙集理论提供了一种处理边界对象不确定的方法。因此将粗糙理论与k-均值方法相结合。同时,传统的k-均值聚类方法必须事先给定聚类数k,但实际情况下k很难确定;另外虽然传统k-均值算法局部搜索能力强,但容易陷入局部最优。遗传算法能得到全局最优解,但收敛过快。鉴于此,提出了一种改进的基于遗传算法的的粗糙聚类方法。该算法能动态地生成k-均值聚类数,采用最大最小原则生成初始聚类中心,同时结合粗糙集理论的上近似和下近似处理边界对象。最后,用UCI的Iris数据集分别对算法进行实际验证。实验结果表明,该算法具有较高的正确率,综合性能更加稳定。

关键词: 聚类分析, 遗传算法, 粗糙集, k-均值算法

Abstract: Traditional clustering methods use hard calculations to divide data objects,but in reality,the objects of different classes often do not have clear boundaries between different kinds of clusters.Rough set theory provides a method of dealing with uncertain boundary objects.Therefore,the rough theory and k-means method are combined.Meanwhile,the traditional k-means clustering method must be given in advance the number of clusters k,but in the actual cases,k is difficult to establish;In addition,traditional k-means algorithm has powerful local search capability,but easily falls into local optimum.Genetic algorithm can get the global optimal solution,but the convergence is fast.In view of this,this paper presents an improved rough clustering method based on genetic algorithm.The algorithm can dynamically generate the number of k-means clustering,using max-min principle to generate the initial cluster centers.Rough set theory’s upper and lower approximation set is combined to deal with the boundary object.Finally,the UCI’s Iris set is used to test the algorithm.The experimental results show that the algorithm has higher accuracy rate and more stable performance.

Key words: cluster analysis, genetic algorithm, rough set, k-means

中图分类号: