一种提高K-近邻算法效率的新算法

计算机工程与应用 ›› 2008, Vol. 44 ›› Issue (4): 163-165.

一种提高K-近邻算法效率的新算法

陆微微,刘晶

中国地质大学计算机科学系，武汉 430074

收稿日期:2007-06-06 修回日期:2007-08-06 出版日期:2008-02-01 发布日期:2008-02-01
通讯作者: 陆微微

New algorithm to scale up efficiency of K-Nearest-Neighbor

LU Wei-wei,LIU Jing

Faculty of Computer Science，China University of Geosciences，Wuhan 430074

Received:2007-06-06 Revised:2007-08-06 Online:2008-02-01 Published:2008-02-01
Contact: LU Wei-wei

摘要/Abstract

摘要： K-近邻（K-Nearest-Neighbor，KNN）算法是一种最基本的基于实例的学习方法，被广泛应用于机器学习与数据挖掘。其学习过程只是简单地存储已知的训练数据。当遇到新的查询实例时，一系列相似的实例被从存储器中取出，并用来分类新的查询实例。KNN的一个不足是分类新实例的开销可能很大。这是因为几乎所有的计算都发生在分类时，而不是在第一次遇到训练实例时。所以，如何有效地索引训练实例，以减少查询时所需计算是一个重要的实践问题。为解决这个问题，提出了一种新的算法。该算法把部分原本发生在分类阶段的计算移到训练阶段来完成。实验表明，算法能够提高KNN效率80%以上。此外，算法的思想还可以应用于KNN的所有变体中。

关键词: K-近邻算法, 基于实例的学习, 效率, 分类

Abstract: The k-Nearest-Neighbor（KNN） algorithm is the most basic instance-based learning method，and is widely used in machine learning and data mining.Learning in KNN consists of simply storing the presented training data.When a new query instance is encountered，a set of similar related instances is retrieved from memory and used to classify the new query instance.One disadvantage of KNN is that the cost of classifying new instances can be high.This is due to the fact that nearly all computation takes place at classification time rather than when the training instances are first encountered.So，how to efficiently index training instances are a significant practical issue in reducing the computation required at query time.In order to set down this issue，this paper presents a new algorithm.It moves some computations taken place at classification time to the training time.The simulation experiments show that it can scale up the efficiency of KNN beyond 80%.Besides，its idea can be applied to all variants of KNN.

Key words: K-Nearest-Neighbor, instance-based learning, efficiency, classification

陆微微,刘晶. 一种提高K-近邻算法效率的新算法[J]. 计算机工程与应用, 2008, 44(4): 163-165.

LU Wei-wei,LIU Jing. New algorithm to scale up efficiency of K-Nearest-Neighbor[J]. Computer Engineering and Applications, 2008, 44(4): 163-165.

[1]	王永贵，李倩玉. 基于KNN-GBDT的混合协同过滤推荐算法[J]. 计算机工程与应用, 2021, 57(9): 103-108.
[2]	杨春霞，李欣栩，吴佳君，刘天宇. 基于注意力交互机制的层次网络情感分类[J]. 计算机工程与应用, 2021, 57(9): 134-139.
[3]	张韩钰，吴志昊，徐勇，陈斌. 增强卷积神经网络的人脸篡改检测方法[J]. 计算机工程与应用, 2021, 57(8): 220-224.
[4]	李俊丽. Spark平台下类别数据互信息计算的并行化[J]. 计算机工程与应用, 2021, 57(7): 95-100.
[5]	韩卫宇，程龙生. 结合马田系统-SVM的滚动轴承故障模式分类研究[J]. 计算机工程与应用, 2021, 57(6): 239-246.
[6]	霍光煜，张勇，孙艳丰，尹宝才. 基于语义的档案数据智能分类方法研究[J]. 计算机工程与应用, 2021, 57(6): 247-253.
[7]	韩东方，吐尔地·托合提，艾斯卡尔·艾木都拉. 问答系统中问句分类方法研究综述[J]. 计算机工程与应用, 2021, 57(6): 10-21.
[8]	黄金杰，蔺江全，何勇军，何瑾洁，王雅君. 局部语义与上下文关系的中文短文本分类算法[J]. 计算机工程与应用, 2021, 57(6): 94-100.
[9]	李硕，梁毅. 面向Spark的批处理应用执行时间预测模型[J]. 计算机工程与应用, 2021, 57(5): 79-87.
[10]	王凤琴，柯亨进. 卷积神经网络及其分析在抑郁症判别中的应用[J]. 计算机工程与应用, 2021, 57(5): 245-250.
[11]	陶体伟，刘明霞，王明亮，王琳琳，杨德运，张强. 基于有效距离的低秩表示[J]. 计算机工程与应用, 2021, 57(4): 141-147.
[12]	郑诚，董春阳，黄夏炎. 基于BTM图卷积网络的短文本分类方法[J]. 计算机工程与应用, 2021, 57(4): 155-160.
[13]	佘海龙，解山娟，邹静洁. 标准分数降维的3D-CNN高光谱遥感图像分类[J]. 计算机工程与应用, 2021, 57(4): 169-175.
[14]	于多，黄永东. 基于SPCA和域变换递归滤波的高光谱图像分类[J]. 计算机工程与应用, 2021, 57(4): 199-208.
[15]	万亚玲，钟锡武，刘慧，钱育蓉. 卷积神经网络在高光谱图像分类中的应用综述[J]. 计算机工程与应用, 2021, 57(4): 1-10.