计算机工程与应用 ›› 2009, Vol. 45 ›› Issue (32): 117-119.DOI: 10.3778/j.issn.1002-8331.2009.32.037

• 数据库、信号与信息处理 • 上一篇    下一篇

基于层次聚类的孤立点检测方法

梁斌梅   

  1. 广西大学 数学与信息科学学院,南宁 530004
  • 收稿日期:2009-08-14 修回日期:2009-09-18 出版日期:2009-11-11 发布日期:2009-11-11
  • 通讯作者: 梁斌梅

Outlier detection method based on hierarchical clustering

LIANG Bin-mei   

  1. College of Mathematics and Information Science,Guangxi University,Nanning 530004,China
  • Received:2009-08-14 Revised:2009-09-18 Online:2009-11-11 Published:2009-11-11
  • Contact: LIANG Bin-mei

摘要: 孤立点检测是数据挖掘过程的重要环节,提出了基于层次聚类的孤立点检测(ODHC)方法。ODHC方法基于层次聚类结果进行分析,对距离矩阵按簇间距离从大到小检测孤立点,可检测出指定离群程度的孤立点,直到达到用户对数据的集中性要求。该方法适用于多维数据集,且算法原理直观,用户友好,对孤立点的检测准确率较高。在iris、balloon等数据集上的仿真实验结果表明,ODHC方法能有效地识别孤立点,是一种简单实用的孤立点检测方法。

关键词: 孤立点检测, 层次聚类, 数据预处理, 数据挖掘

Abstract: Outlier detection is an important step of data mining,a new Outlier Detection method based on Hierarchical Clustering(ODHC) is proposed.ODHC method takes an analysis based on the results of hierarchical clustering,and detects outliers by the distance matrix in decending order of distance between clusters.Outlier in the specified degree of isolation can be detected,until it meets the user’s requirement of data-intensive.This method is applicable to multi-dimensional data sets,and the algorithm is principle-intuitive,user-friendly,and high accuracy in outlier detection.Experimental results on iris and balloon data sets show that ODHC method can effectively identify the outliers,and is a simple and applicable method of outliers detection.

Key words: outlier detection, hierarchical clustering, data preprocessing, data mining

中图分类号: