计算机工程与应用 ›› 2008, Vol. 44 ›› Issue (11): 155-157.

• 数据库、信号与信息处理 • 上一篇    下一篇

基于遗传聚类算法的离群点检测

钱光超,贾瑞玉,张 然,李龙澍   

  1. 安徽大学 计算机科学与技术学院,合肥 230039
  • 收稿日期:2007-07-24 修回日期:2007-09-28 出版日期:2008-04-11 发布日期:2008-04-11
  • 通讯作者: 钱光超

Outlier detection based on genetic algorithm for clustering

QIAN Guang-chao,JIA Rui-yu,ZHANG Ran,LI Long-shu   

  1. School of Computer Science and Technology,Anhui University,Hefei 230039,China
  • Received:2007-07-24 Revised:2007-09-28 Online:2008-04-11 Published:2008-04-11
  • Contact: QIAN Guang-chao

摘要: 离群点检测是数据挖掘一个重要内容,它为分析各种海量的、复杂的、含有噪声的数据提供了新的方法。对离群数据挖掘几类主要的方法进行了分析和评价,并在此基础上了提出了一种基于遗传聚类的离群点检测算法。该算法结合了遗传算法全局搜索的优点和K-均值方法局部收敛速度快的特点,取得较好效果。实验验证该算法很好地检测到数据集中的离群点,同时还完成了数据集的聚类。具有较好的实用性。

关键词: 离群点检测, 数据挖掘, 遗传算法, 聚类, K-均值算法

Abstract: Outlier detection,as an important aspect of data mining,provides a new method for analyzing various quantitative,complex and noisy data.In this paper,authors analyze and evaluate several major methods of the outlier data mining,and propose a new outlier detection algorithm which is based on an genetic algorithm for clustering.By integrating with global searching of the genetic algorithm and the good local convergence rate of the K-means algorithm,this algorithm gets a better result.Experiments show that this algorithm not only can detect the outliers in the dataset,but also complete the clustering of the dataset.So it has a good practicality.

Key words: outlier detection, data mining, genetic algorithm, clustering, K-means algorithm