Computer Engineering and Applications ›› 2009, Vol. 45 ›› Issue (7): 153-155.DOI: 10.3778/j.issn.1002-8331.2009.07.046

• 数据库、信号与信息处理 • Previous Articles     Next Articles

Improved KNN using clustering algorithm

JIANG Tao,CHEN Xiao-li,ZHANG Yu-fang,XIONG Zhong-yang   

  • Received:2008-01-21 Revised:2008-04-23 Online:2009-03-01 Published:2009-03-01
  • Contact: JIANG Tao

基于聚类算法的KNN文本分类算法研究

江 涛,陈小莉,张玉芳,熊忠阳   

  • 通讯作者: 江 涛

Abstract: KNN is of the best text categorization algorithm and is used widely.The uneven distribution in training set will affect categorization result negatively.This paper prsents an improved KNN method and verifies its effectiveness by the experiments.The classification performance is promoted.

摘要: KNN算法是一种在人工智能领域如专家系统、数据挖掘、模式识别等方面广泛应用的算法。该算法简单有效,易于实现。但是KNN算法在决定测试样本的类别时,是把所求的该测试样本的K个最近邻是等同看待的,即不考虑这K个最近邻能表达所属类别的程度。由于训练样本的分布是不均匀的,每个样本对分类的贡献也就不一样,因此有必要有区别的对待训练样本集合中的每个样本。利用聚类算法,求出训练样本集合中每个训练样本的隶属度,利用隶属度来区别对待测试样本的K个最近邻。通过实验证明,改进后的KNN算法较好的精确性。