改进的共享最近邻聚类算法

计算机工程与应用 ›› 2011, Vol. 47 ›› Issue (8): 138-142.

• 数据库、信号与信息处理 • 上一篇下一篇

改进的共享最近邻聚类算法

李霞，蒋盛益

广东外语外贸大学思科信息学院，广州 510006

收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2011-03-11 发布日期:2011-03-11

Improved shared nearest neighbor clustering algorithm

LI Xia，JIANG Shengyi

Cisco School of Informatics，Guangdong University of Foreign Studies，Guangzhou 510006，China

Received:1900-01-01 Revised:1900-01-01 Online:2011-03-11 Published:2011-03-11

摘要/Abstract

摘要： 聚类是一种无监督的机器学习方法，其任务是发现数据中的自然簇。共享最近邻聚类算法（SNN）在处理大小不同、形状不同以及密度不同的数据集上具有很好的聚类效果，但该算法还存在以下不足：（1）时间复杂度为O（n2），不适合处理大规模数据集；（2）没有明确给出参数阈值的简单指导性操作方法；（3）只能处理数值型属性数据集。对共享最近邻算法进行改进，使其能够处理混合属性数据集，并给出参数阈值的简单选择方法，改进后算法运行时间与数据集大小成近似线性关系，适用于大规模高维数据集。在真实数据集和人造数据集上的实验结果表明，提出的改进算法是有效可行的。

关键词: 共享最近邻聚类算法, 一趟聚类算法, 大规模数据集

Abstract: Clustering is a method of unsupervised learning in machine learning，the typical task of which is to discovery “natural” clusters present in the data.The shared nearest neighbor algorithm is one of the most efficient clustering algorithm which can handle datasets of different sizes，shapes and densities.But there are still some shortages about the algorithm.SNN can’t handle large dataset because of its high complexity.There are no definite methods about threshold of the algorithm.SNN can not process databases with mixture attributes.This paper improves the SNN algorithm to process the data with categorical attributes，gives a simple definite method to select threshold of the algorithm.The time complexity of the improved algorithm is nearly linear with the size of dataset and can be used to large dataset.The experimental results on real datasets and synthetic datasets show that the improved algorithm is effective and practicable.

Key words: shared nearest neighbor clustering algorithm, one-pass clustering algorithm, large dataset

李霞，蒋盛益. 改进的共享最近邻聚类算法[J]. 计算机工程与应用, 2011, 47(8): 138-142.

LI Xia，JIANG Shengyi. Improved shared nearest neighbor clustering algorithm[J]. Computer Engineering and Applications, 2011, 47(8): 138-142.

[1]	周玉，朱文豪，房倩，白磊. 基于聚类的离群点检测方法研究综述[J]. 计算机工程与应用, 2021, 57(12): 37-45.
[2]	廖士中，卢玮. 随机特征上一致中心调节的支持向量机[J]. 计算机工程与应用, 2014, 50(17): 44-48.
[3]	丛伟杰1，刘红卫2. 求解最小闭包球问题改进的SMO-型算法[J]. 计算机工程与应用, 2013, 49(3): 1-3.
[4]	徐健陈光喜. 一种处理较大规模数据分类的支持向量机[J]. 计算机工程与应用, 2007, 43(6期): 165-167.

改进的共享最近邻聚类算法

Improved shared nearest neighbor clustering algorithm

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 4

编辑推荐

Metrics