一种基于代表点和点密度的聚类算法

doi:10.3778/j.issn.1002-8331.2008.28.046

计算机工程与应用 ›› 2008, Vol. 44 ›› Issue (28): 136-139.DOI: 10.3778/j.issn.1002-8331.2008.28.046

• 数据库、信号与信息处理 • 上一篇下一篇

一种基于代表点和点密度的聚类算法

陈园园¹,陈治平^1,2

1.湖南大学计算机与通信学院，长沙 410082
2.清华大学计算机科学与技术系，北京 100084

收稿日期:2007-11-20 修回日期:2008-02-18 出版日期:2008-10-01 发布日期:2008-10-01
通讯作者: 陈园园

New clustering algorithm based on representatives and point density

CHEN Yuan-yuan¹,CHEN Zhi-ping^1,2

1.College of Computer and Communication，Hunan University，Changsha 410082，China
2.Department of Computer Science and Technology，Tsinghua University，Beijing 100084，China

Received:2007-11-20 Revised:2008-02-18 Online:2008-10-01 Published:2008-10-01
Contact: CHEN Yuan-yuan

摘要/Abstract

摘要： 针对基于密度的聚类方法不能发现密度分布不均的数据样本的缺陷，提出了一种基于代表点和点密度的聚类算法。算法通过检查数据库中每个点的k近邻来寻找聚类。首先选取一个种子点作为类的第一个代表点，其k近邻为其代表区域，如果代表区域中的点密度满足密度阈值，则将该点作为一个新的代表点，如此反复地寻找代表点，这些区域相连的代表点及其代表区域将构成一个聚类。实验结果表明，该算法能够发现任意形状、大小和密度的聚类。

关键词: 数据挖掘, 聚类, 点密度, 代表点, 密度阈值

Abstract: Aimed to solve the problem that the density-based clustering algorithm dose not work well when data distribution is not even，a new clustering algorithm based on representatives and point density is provided.The algorithm discovers the clusters by examining k neighbors of each point in the data base.It chooses a seed point as the first representative and the representative’s k neighbors as its represent area.If the point in the represent areas satisfies the density threshold，this point will be a new representative.And repeating searching like this，all the linked represent areas and representatives will be a cluster.Experimental results show that this algorithm can discover clusters with arbitrary shapes and densities at different levels.

Key words: data mining, clustering, point density, representative, density threshold

陈园园¹,陈治平^1,2. 一种基于代表点和点密度的聚类算法[J]. 计算机工程与应用, 2008, 44(28): 136-139.

CHEN Yuan-yuan¹,CHEN Zhi-ping^1,2. New clustering algorithm based on representatives and point density[J]. Computer Engineering and Applications, 2008, 44(28): 136-139.

[1]	兰红，黄敏. 融合KNN优化的密度峰值和FCM聚类算法[J]. 计算机工程与应用, 2021, 57(9): 81-88.
[2]	郭晓静，隋昊达. 改进YOLOv3在机场跑道异物目标检测中的应用[J]. 计算机工程与应用, 2021, 57(8): 249-255.
[3]	李莉，纪欣沅，宋嵩. 回环软件缺陷数量预测模型[J]. 计算机工程与应用, 2021, 57(7): 158-163.
[4]	霍光煜，张勇，孙艳丰，尹宝才. 基于语义的档案数据智能分类方法研究[J]. 计算机工程与应用, 2021, 57(6): 247-253.
[5]	杨芳，尹曦，司建辉，刘宏媛，汪雪. 基于侧重点聚类的数学表达式相似度计算方法[J]. 计算机工程与应用, 2021, 57(6): 88-93.
[6]	宗晓萍，陶泽泽. 基于掌握速度的知识追踪模型[J]. 计算机工程与应用, 2021, 57(6): 117-123.
[7]	赵凡，张琳，闻治泉，杨林林，蔺广逢. 一种直接高效的自然场景汉字逼近定位方法[J]. 计算机工程与应用, 2021, 57(6): 159-167.
[8]	彭启慧，宣士斌，高卿. 分布的自动阈值密度峰值聚类算法[J]. 计算机工程与应用, 2021, 57(5): 71-78.
[9]	李勇振，廖湖声. 基于图卷积神经网络的多视角聚类[J]. 计算机工程与应用, 2021, 57(5): 115-122.
[10]	王昌龙，张远东，缪宏，杨煜恒. 双通道卷积神经网络在南瓜病害识别上的应用[J]. 计算机工程与应用, 2021, 57(5): 183-189.
[11]	胡晓敏，王明丰，张首荣，李敏. 用于文本聚类的新型差分进化粒子群算法[J]. 计算机工程与应用, 2021, 57(4): 61-67.
[12]	王俊玲，卢新明. 基于语义相关的视频关键帧提取算法[J]. 计算机工程与应用, 2021, 57(4): 192-198.
[13]	王芙银，张德生，张晓. 结合鲸鱼优化算法的自适应密度峰值聚类算法[J]. 计算机工程与应用, 2021, 57(3): 94-102.
[14]	高天宇，王庆荣，杨磊. 粗糙集属性依赖度强化的应急数据挖掘模型[J]. 计算机工程与应用, 2021, 57(3): 87-93.
[15]	陈俊丰，郑中团. WKMeans与SMOTE结合的不平衡数据过采样方法[J]. 计算机工程与应用, 2021, 57(23): 106-112.

一种基于代表点和点密度的聚类算法

New clustering algorithm based on representatives and point density

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics