Computer Engineering and Applications ›› 2019, Vol. 55 ›› Issue (22): 40-45.DOI: 10.3778/j.issn.1002-8331.1809-0275
Previous Articles Next Articles
WANG Ziqi, HE Jinwen, JIANG Liangxiao
Online:
Published:
王子旗,何锦雯,蒋良孝
Abstract: As one of the top 10 algorithms in data mining, the K-Nearest-Neighbor(KNN) algorithm is widely used because it is an non-parametric, simple and effective algorithm without training time. However, when it faces to massive amount of high-dimensional training examples, its high classification time complexity becomes a bottleneck of its application. In addition, its classification performance is often harmed, when the class distribution of training examples is skewed and the class imbalance problem occurs. To address these two issues, this paper proposes a new redundancy-based algorithm for reducing the amount of training examples (simply RBKNN). RBKNN at first computes the redundancy of each training example, and then randomly deletes some high redundant training examples by introducing a pre-processing process. RBKNN can not only reduce the size of training example set, but also make the class distribution of training examples more balanced. The experimental results show that RBKNN significantly promotes the efficiency of KNN, yet at the same time maintains or improves the classification accuracy of KNN.
Key words: KNN classifiers, example reduction, fast classification, class imbalance
摘要: 作为数据挖掘领域十大算法之一,K-近邻算法(K-Nearest-Neighbor,KNN)因具有非参数、无需训练时间、简单有效等特点而得到广泛应用。然而,KNN算法在面对高维的大训练样本集时,分类时间复杂度高的问题成为其应用的瓶颈。另外,因训练样本的类分布不均匀而导致的类不平衡问题也会影响其分类性能。针对这两个问题,提出了一种基于冗余度的KNN分类器训练样本裁剪新算法(简记为RBKNN)。RBKNN通过引入训练样本集预处理过程,对每个训练样本进行冗余度计算,并随机裁剪掉部分高冗余度的训练样本,从而达到减小训练样本规模、均衡样本分布的目的。实验结果表明,RBKNN可在保持或改善分类精度的前提下显著提升KNN的分类效率。
关键词: KNN分类器, 样本裁剪, 快速分类, 类不平衡
WANG Ziqi, HE Jinwen, JIANG Liangxiao. New Redundancy-Based Algorithm for Reducing Amount of Training Examples in KNN[J]. Computer Engineering and Applications, 2019, 55(22): 40-45.
王子旗,何锦雯,蒋良孝. 基于冗余度的KNN训练样本裁剪新算法[J]. 计算机工程与应用, 2019, 55(22): 40-45.
0 / Recommend
Add to citation manager EndNote|Ris|BibTeX
URL: http://cea.ceaj.org/EN/10.3778/j.issn.1002-8331.1809-0275
http://cea.ceaj.org/EN/Y2019/V55/I22/40