基于对象数量的宽度加权聚类kNN算法

doi:10.3778/j.issn.1002-8331.1808-0208

计算机工程与应用 ›› 2018, Vol. 54 ›› Issue (19): 1-9.DOI: 10.3778/j.issn.1002-8331.1808-0208

基于对象数量的宽度加权聚类kNN算法

陈辉1，关凯胜1，李嘉兴1，2

1.广东工业大学计算机学院，广州 510006
2.广东省大数据分析与处理重点实验室，广州 510006

出版日期:2018-10-01 发布日期:2018-10-19

Width-weighted clustering kNN algorithm based on number of objects

CHEN Hui1, GUAN Kaisheng1, LI Jiaxing1，2

1.School of Computer Science and Technology, Guangdong University of Technology, Guangzhou 510006, China
2.Guangdong Key Laboratory of Big Data Analysis and Processing, Guangzhou 510006, China

Online:2018-10-01 Published:2018-10-19

摘要/Abstract

摘要： 传统k最近邻算法（k-Nearest Neighbor，kNN）作为一种非参数化分类技术在数据分析中具有广泛的应用，但该算法具有较多的冗余计算，致使处理数据时需要花费较多的计算时间。目前大量的研究都集中在数据的预处理阶段，通过为数据建立模型降低kNN查询的计算量。提出一种基于对象数量的宽度加权聚类kNN算法（NOWCkNN），该算法中数据集首先以全局宽度进行聚类，每个生成的子集群根据其对象数量递归计算其宽度的权值，然后算法根据其权值的大小和调和系数调节宽度值，最后生成不同宽度大小的集群用于kNN查询。这不仅减少了算法的聚类时间，还能平衡产生集群的大小，减少迭代次数，使三角不等式修剪率达到最大。实验结果表明，NOWCkNN算法与现有工作相比在各个维度的数据集中有较好的性能，尤其是在高维度、数据量较大的数据集中有更高的修剪效率。

关键词: 聚类, k-最近邻, 三角不等式, 宽度加权, 高维数据

Abstract: The traditional k-Nearest Neighbor（kNN） algorithm has a wide range of applications as a non-parameterized data clustering algorithm. However, the algorithm has more redundancy calculations, which leads to more computation time when processing data. A large amount of research is currently focused on the preprocessing stage of data, and the computational complexity of kNN queries is reduced by modeling the data. This paper proposes a width-weighted clustering kNN algorithm based on the number of objects for kNN query（NOWCkNN）. The algorithm performs width learning based on the number of objects. Firstly, data sets are clustered with global width, and then each generated cluster calculates the weight of its width recursively based on the number of objects. The algorithm can adjust the width value according to the number of cluster’s objects. In terms of clustering, the algorithm not only reduces clustering time and the number of iterations, but also balances cluster size and maximizes the trigonometric inequality. Experimental results show that the work outperforms than the existing works in all dimensions of datasets, especially in high-dimensional and large datasets.

Key words: clustering, k-Nearest Neighbors（kNN）, trigonometric inequality, width-weighted, high-dimensional data

陈辉1，关凯胜1，李嘉兴1，2. 基于对象数量的宽度加权聚类kNN算法[J]. 计算机工程与应用, 2018, 54(19): 1-9.

CHEN Hui1, GUAN Kaisheng1, LI Jiaxing1，2. Width-weighted clustering kNN algorithm based on number of objects[J]. Computer Engineering and Applications, 2018, 54(19): 1-9.

[1]	兰红，黄敏. 融合KNN优化的密度峰值和FCM聚类算法[J]. 计算机工程与应用, 2021, 57(9): 81-88.
[2]	郭晓静，隋昊达. 改进YOLOv3在机场跑道异物目标检测中的应用[J]. 计算机工程与应用, 2021, 57(8): 249-255.
[3]	李莉，纪欣沅，宋嵩. 回环软件缺陷数量预测模型[J]. 计算机工程与应用, 2021, 57(7): 158-163.
[4]	霍光煜，张勇，孙艳丰，尹宝才. 基于语义的档案数据智能分类方法研究[J]. 计算机工程与应用, 2021, 57(6): 247-253.
[5]	杨芳，尹曦，司建辉，刘宏媛，汪雪. 基于侧重点聚类的数学表达式相似度计算方法[J]. 计算机工程与应用, 2021, 57(6): 88-93.
[6]	赵凡，张琳，闻治泉，杨林林，蔺广逢. 一种直接高效的自然场景汉字逼近定位方法[J]. 计算机工程与应用, 2021, 57(6): 159-167.
[7]	彭启慧，宣士斌，高卿. 分布的自动阈值密度峰值聚类算法[J]. 计算机工程与应用, 2021, 57(5): 71-78.
[8]	李勇振，廖湖声. 基于图卷积神经网络的多视角聚类[J]. 计算机工程与应用, 2021, 57(5): 115-122.
[9]	王昌龙，张远东，缪宏，杨煜恒. 双通道卷积神经网络在南瓜病害识别上的应用[J]. 计算机工程与应用, 2021, 57(5): 183-189.
[10]	胡晓敏，王明丰，张首荣，李敏. 用于文本聚类的新型差分进化粒子群算法[J]. 计算机工程与应用, 2021, 57(4): 61-67.
[11]	王俊玲，卢新明. 基于语义相关的视频关键帧提取算法[J]. 计算机工程与应用, 2021, 57(4): 192-198.
[12]	王芙银，张德生，张晓. 结合鲸鱼优化算法的自适应密度峰值聚类算法[J]. 计算机工程与应用, 2021, 57(3): 94-102.
[13]	陈俊丰，郑中团. WKMeans与SMOTE结合的不平衡数据过采样方法[J]. 计算机工程与应用, 2021, 57(23): 106-112.
[14]	张忠林，赵昱，闫光辉. 自然邻居密度极值聚类算法[J]. 计算机工程与应用, 2021, 57(23): 200-210.
[15]	梅婕，魏圆圆，许桃胜. 基于密度峰值多起始中心的融合聚类算法[J]. 计算机工程与应用, 2021, 57(22): 78-85.

基于对象数量的宽度加权聚类kNN算法

Width-weighted clustering kNN algorithm based on number of objects

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics