Width-weighted clustering kNN algorithm based on number of objects

doi:10.3778/j.issn.1002-8331.1808-0208

Abstract

Abstract: The traditional k-Nearest Neighbor（kNN） algorithm has a wide range of applications as a non-parameterized data clustering algorithm. However, the algorithm has more redundancy calculations, which leads to more computation time when processing data. A large amount of research is currently focused on the preprocessing stage of data, and the computational complexity of kNN queries is reduced by modeling the data. This paper proposes a width-weighted clustering kNN algorithm based on the number of objects for kNN query（NOWCkNN）. The algorithm performs width learning based on the number of objects. Firstly, data sets are clustered with global width, and then each generated cluster calculates the weight of its width recursively based on the number of objects. The algorithm can adjust the width value according to the number of cluster’s objects. In terms of clustering, the algorithm not only reduces clustering time and the number of iterations, but also balances cluster size and maximizes the trigonometric inequality. Experimental results show that the work outperforms than the existing works in all dimensions of datasets, especially in high-dimensional and large datasets.

Key words: clustering, k-Nearest Neighbors（kNN）, trigonometric inequality, width-weighted, high-dimensional data

摘要： 传统k最近邻算法（k-Nearest Neighbor，kNN）作为一种非参数化分类技术在数据分析中具有广泛的应用，但该算法具有较多的冗余计算，致使处理数据时需要花费较多的计算时间。目前大量的研究都集中在数据的预处理阶段，通过为数据建立模型降低kNN查询的计算量。提出一种基于对象数量的宽度加权聚类kNN算法（NOWCkNN），该算法中数据集首先以全局宽度进行聚类，每个生成的子集群根据其对象数量递归计算其宽度的权值，然后算法根据其权值的大小和调和系数调节宽度值，最后生成不同宽度大小的集群用于kNN查询。这不仅减少了算法的聚类时间，还能平衡产生集群的大小，减少迭代次数，使三角不等式修剪率达到最大。实验结果表明，NOWCkNN算法与现有工作相比在各个维度的数据集中有较好的性能，尤其是在高维度、数据量较大的数据集中有更高的修剪效率。

关键词: 聚类, k-最近邻, 三角不等式, 宽度加权, 高维数据

CHEN Hui1, GUAN Kaisheng1, LI Jiaxing1，2. Width-weighted clustering kNN algorithm based on number of objects[J]. Computer Engineering and Applications, 2018, 54(19): 1-9.

陈辉1，关凯胜1，李嘉兴1，2. 基于对象数量的宽度加权聚类kNN算法[J]. 计算机工程与应用, 2018, 54(19): 1-9.

[1]	LAN Hong, HUANG Min. Fusion of KNN Optimized Density Peaks and FCM Clustering Algorithm [J]. Computer Engineering and Applications, 2021, 57(9): 81-88.
[2]	GUO Xiaojing, SUI Haoda. Application of Improved YOLOv3 in Foreign Object Debris Target Detection on Airfield Pavement [J]. Computer Engineering and Applications, 2021, 57(8): 249-255.
[3]	LI Li, JI Xinyuan, SONG Song. Prediction Model for Number of Software Defects in Loop [J]. Computer Engineering and Applications, 2021, 57(7): 158-163.
[4]	HUO Guangyu, ZHANG Yong, SUN Yanfeng, YIN Baocai. Research on Archive Data Intelligent Classification Based on Semantic [J]. Computer Engineering and Applications, 2021, 57(6): 247-253.
[5]	YANG Fang, YIN Xi, SI Jianhui, LIU Hongyuan, WANG Xue. Mathematical Expression Similarity Calculation Method Based on Focus Clustering [J]. Computer Engineering and Applications, 2021, 57(6): 88-93.
[6]	ZHAO Fan, ZHANG Lin, WEN Zhiquan, YANG Linlin, LIN Guangfeng. Direct and Efficient Natural Scene Chinese Character Approaching Spotting Method [J]. Computer Engineering and Applications, 2021, 57(6): 159-167.
[7]	PENG Qihui, XUAN Shibin, GAO Qing. Distribution Automatic Threshold Density Peak Clustering Algorithm [J]. Computer Engineering and Applications, 2021, 57(5): 71-78.
[8]	LI Yongzhen, LIAO Husheng. Multi-view Clustering via Graph Convolutional Neural Network [J]. Computer Engineering and Applications, 2021, 57(5): 115-122.
[9]	WANG Changlong, ZHANG Yuandong, MIAO Hong, YANG Yuheng. Application of Double Channel Convolutional Neural Network in Pumpkin Diseases Identification [J]. Computer Engineering and Applications, 2021, 57(5): 183-189.
[10]	HU Xiaomin, WANG Mingfeng, ZHANG Shourong, LI Min. New Differential Evolution with Particle Swarm Optimization Algorithm for Text Clustering [J]. Computer Engineering and Applications, 2021, 57(4): 61-67.
[11]	WANG Junling, LU Xinming. Video Key Frame Extraction Algorithm Based on Semantic Correlation [J]. Computer Engineering and Applications, 2021, 57(4): 192-198.
[12]	WANG Fuyin, ZHANG Desheng, ZHANG Xiao. Adaptive Density Peaks Clustering Algorithm Combining with Whale Optimization Algorithm [J]. Computer Engineering and Applications, 2021, 57(3): 94-102.
[13]	CHEN Junfeng, ZHENG Zhongtuan. Over-Sampling Method on Imbalanced Data Based on WKMeans and SMOTE [J]. Computer Engineering and Applications, 2021, 57(23): 106-112.
[14]	ZHANG Zhonglin, ZHAO Yu, YAN Guanghui. Natural Neighbor Density Extremum Clustering Algorithm [J]. Computer Engineering and Applications, 2021, 57(23): 200-210.
[15]	MEI Jie, WEI Yuanyuan, XU Taosheng. Fusion Clustering Algorithm Based on Multi-Prototypes Using Density Peaks [J]. Computer Engineering and Applications, 2021, 57(22): 78-85.

Width-weighted clustering kNN algorithm based on number of objects

基于对象数量的宽度加权聚类kNN算法

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics