Computer Engineering and Applications ›› 2010, Vol. 46 ›› Issue (3): 115-117.DOI: 10.3778/j.issn.1002-8331.2010.03.034

• 数据库、信号与信息处理 • Previous Articles     Next Articles

Enhancement of K-nearest neighbor algorithm based on information entropy of attribute value

TONG Xian-qun,ZHOU Zhong-mei   

  1. Department of Computer Science & Engineering,Zhangzhou Normal University,Zhangzhou,Fujian 363000,China
  • Received:2009-10-12 Revised:2009-11-30 Online:2010-01-21 Published:2010-01-21
  • Contact: TONG Xian-qun

基于属性值信息熵的KNN改进算法

童先群,周忠眉   

  1. 漳州师范学院 计算机科学与工程系,福建 漳州 363000
  • 通讯作者: 童先群

Abstract: In order to improve traditional KNN and KNN with weighted distance,which is on the distance definition and test mode,an improved algorithm entropy-KNN based on the classification importance of an attribute value is proposed.At first,a distance of the two samples is defined as the average information entropy of the same attribute values.The distance can effectively measure the similarity degree of the two samples.Secondly,the Entropy-KNN selects the K nearest neighbors by the distance above.Finally,the class label of the test sample is decided by the average distance and the numbers on the respective class.The experimental results on mushroom data set show this approach has much better than traditional KNN and KNN with weighted distance.

Key words: classification, K-nearest neighbor algorithm, attribute value, information entropy

摘要: 为了克服传统KNN算法,距离加权-KNN算法在距离定义及投票方式上的不足,提出了一种基于属性值对类别重要性的改进算法Entropy-KNN。首先定义两个样本间的距离为相同属性值的平均信息熵,此距离可通过重要属性值有效度量样本之间的相似程度,其次算法Entropy-KNN根据上述定义的距离选取与待测试样本距离最小的K个近邻,最后根据各类近邻样本点的平均距离及个数判断待测试样本的类别。在蘑菇数据集上的实验表明,Entropy-KNN算法的分类准确率高于传统KNN算法和距离加权KNN算法。

关键词: 分类, KNN算法, 属性值, 信息熵

CLC Number: