计算机工程与应用 ›› 2021, Vol. 57 ›› Issue (13): 102-107.DOI: 10.3778/j.issn.1002-8331.2006-0021

• 大数据与云计算 • 上一篇    下一篇

基于双向选择的伪近邻算法

蔡瑞光,张德生,张晓   

  1. 西安理工大学 理学院,西安 710054
  • 出版日期:2021-07-01 发布日期:2021-06-29

Bidirectional Selection-Based Pseudo Nearest Neighbor Algorithm

CAI Ruiguang, ZHANG Desheng, ZHANG Xiao   

  1. College of Science, Xi’an University of Technology, Xi’an 710054, China
  • Online:2021-07-01 Published:2021-06-29

摘要:

针对伪近邻分类算法(LMPNN)对异常点和噪声点仍然敏感的问题,提出了一种基于双向选择的伪近邻算法(BS-PNN)。利用邻近性度量选取[k]个最近邻,让测试样本和近邻样本通过互近邻定义进行双向选择;通过计算每类中互近邻的个数及其局部均值的加权距离,从而得到测试样本到伪近邻的欧氏距离;利用改进的类可信度作为投票度量方式,对测试样本进行分类。BS-PNN算法在处理复杂的分类任务时,具有能够准确识别噪声点,降低近邻个数[k]的敏感性,提高分类精度等优势。在UCI和KEEL的15个实际数据集上进行仿真实验,并与KNN、WKNN、LMKNN、PNN、LMPNN、DNN算法以及P-KNN算法进行比较,实验结果表明,基于双向选择的伪近邻算法的分类性能明显优于其他几种近邻分类算法。

关键词: KNN算法, 互近邻, 类可信度, 模式分类

Abstract:

Since the Pseudo-Nearest Neighbor classification algorithm(LMPNN) is sensitive to the outliers and noise points in data, this paper tries to make the improvement on LMPNN and proposes a Bidirectional Selection-based Pseudo Nearest Neighbor algorithm(BS-PNN). Firstly, the [k] Nearest Neighbors are selected by using the proximity measure; and then, the test sample and the neighbor samples are selected bidirectionally through the mutual neighbor definition. Secondly, by calculating both the number of neighbors and the weighted distance of the local mean of the neighbors in each class, the Euclidean distance between the test sample and the pseudo-neighbor is obtained. Finally, the test sample is classified by the voting method which uses an improved class credibility measure. The proposed method has the advantages of being able to accurate identification of noise points, reduce the sensitivity of the number of nearest neighbor [k], and improve the classification accuracy when dealing with complex classification tasks. The simulation experiments have been performed on 15 real data sets of UCI and KEEL, and KNN, WKNN, LMKNN, PNN, LMPNN, DNN and P-KNN algorithms have been employed to compare with the proposed algorithm. The experimental results show that the classification performance of the proposed bidirectional selection-based pseudo nearest neighbor algorithm is significantly better than several other neighbor classification algorithms.

Key words: K Nearest Neighbors(KNN), mutual neighbor, class credibility, pattern classification