Detection of top-n global outliers in datasets based on hierarchical clustering

Computer Engineering and Applications ›› 2012, Vol. 48 ›› Issue (9): 101-103.

Previous Articles Next Articles

Detection of top-n global outliers in datasets based on hierarchical clustering

LIANG Binmei

1.College of Mathematics and Information Science, Guangxi University, Nanning 530004, China
2.College of Computer Science, Sichuan University, Chengdu 610065, China

Received:1900-01-01 Revised:1900-01-01 Online:2012-03-21 Published:2012-04-11

基于层次聚类识别数据集前n个全局孤立点

梁斌梅

1.广西大学数学与信息科学学院，南宁 530004
2.四川大学计算机学院，成都 610065

Abstract

Abstract: The existance of outlier always leads to inaccurate, even wrong results in data mining. The outlier detection algorithm now available should be improved including its versatility, effectiveness, user-friendliness, and the performance in processing high-dimensional and large databases. An effective and global outlier detection method is proposed. Agglomerative hierarchical clustering is performed, and the isolated degree of the data can be visually judged by the clustering tree and distance matrix, and the number of the outliers can be determined and the outliers are identified unsupervisedly from the top to down of the clustering tree. Experimental results show that the method can effectively detect the top-n global outliers, and applicable to datasets of various shapes. Experimental results show that the algorithm is efficient, user-friendly, and applicable to detect the outliers for high-dimensional and large databases.

Key words: outlier detection, hierarchical clustering, data mining

摘要： 孤立数据的存在使数据挖掘结果不准确，甚至错误。现有的孤立点检测算法在通用性、有效性、用户友好性及处理高维大数据集的性能还不完善，为此，提出一种有效的全局孤立点检测方法，该方法进行凝聚层次聚类，根据聚类树和距离矩阵来可视化判断数据孤立程度，确定孤立点数目。从聚类树自顶向下，无监督地去除离群数据点。在多个数据集上的仿真实验结果表明，该方法能有效识别孤立程度最大的前n个全局孤立点，适用于不同形状的数据集，算法效率高，用户友好，且适用于大型高维数据集的孤立点检测。

关键词: 孤立点检测, 层次聚类, 数据挖掘

LIANG Binmei. Detection of top-n global outliers in datasets based on hierarchical clustering[J]. Computer Engineering and Applications, 2012, 48(9): 101-103.

梁斌梅. 基于层次聚类识别数据集前n个全局孤立点[J]. 计算机工程与应用, 2012, 48(9): 101-103.

[1]	ZONG Xiaoping, TAO Zeze. Knowledge Tracing Model Based on Mastery Speed [J]. Computer Engineering and Applications, 2021, 57(6): 117-123.
[2]	WANG Junling, LU Xinming. Video Key Frame Extraction Algorithm Based on Semantic Correlation [J]. Computer Engineering and Applications, 2021, 57(4): 192-198.
[3]	GAO Tianyu, WANG Qingrong, YANG Lei. Data Mining Model Based on Attribute Dependability Enhancement of Rough Set [J]. Computer Engineering and Applications, 2021, 57(3): 87-93.
[4]	MA Yang, ZHAO Xujun. Multi-source Outlier Detection Algorithm Based on Relevant Subspace [J]. Computer Engineering and Applications, 2021, 57(17): 88-95.
[5]	ZHANG Nianpeng, WU Xu, ZHU Qiang. Entropy-Based Oversampling Framework [J]. Computer Engineering and Applications, 2021, 57(13): 96-101.
[6]	ZHOU Yu, ZHU Wenhao, FANG Qian, BAI Lei. Survey of Outlier Detection Methods Based on Clustering [J]. Computer Engineering and Applications, 2021, 57(12): 37-45.
[7]	ZHANG Bowen, LIU Zhi, SANG Guoming. Anomaly Detection Algorithm Based on Kernel Density Fluctuation [J]. Computer Engineering and Applications, 2021, 57(12): 132-136.
[8]	RAO Jiawang, MA Ronghua. Improved Kernel Density Estimator Based Spatial Point Density Algorithm [J]. Computer Engineering and Applications, 2021, 57(11): 260-265.
[9]	WANG Jie, CHEN Zhigang, LIU Jialing, CHENG Hongbing. Privacy Behavior Mining Technology for Cloud Computing Based on Clustering [J]. Computer Engineering and Applications, 2020, 56(5): 80-84.
[10]	HONG Zheng, GONG Qiyuan, FENG Wenbo, LI Yihao. Unknown Application Layer Protocol Recognition Based on Adaptive Clustering [J]. Computer Engineering and Applications, 2020, 56(5): 109-117.
[11]	HE Huanye, LIN Guoyuan, GU Hao, FANG Menghua. Improved LOF Algorithm in Cloud Virtual Machine Anomaly Detection Scenario [J]. Computer Engineering and Applications, 2020, 56(23): 80-86.
[12]	WANG Zilong, LI Jin, SONG Yafei. Improved K-means Algorithm Based on Distance and Weight [J]. Computer Engineering and Applications, 2020, 56(23): 87-94.
[13]	JI Wenlu, WANG Hailong, SU Guibin, LIU Lin. Review of Recommendation Methods Based on Association Rules Algorithm [J]. Computer Engineering and Applications, 2020, 56(22): 33-41.
[14]	YI Junyan, WU Boya, YONG Qiaoling. Research on Clustering Algorithm of Elastic Net with Weighted Characteristics [J]. Computer Engineering and Applications, 2020, 56(22): 55-65.
[15]	QIN Fengting, YANG Youlong, QIU Haiquan. Sparse Subspace-Based Method for Local Outlier Detection [J]. Computer Engineering and Applications, 2020, 56(19): 152-159.

Detection of top-n global outliers in datasets based on hierarchical clustering

基于层次聚类识别数据集前n个全局孤立点

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics