Computer Engineering and Applications ›› 2012, Vol. 48 ›› Issue (2): 142-144.

• 数据库、信号与信息处理 • Previous Articles     Next Articles

Improved KNN text categorization

ZHONG Jiang, LIU Ronghui   

  1. College of Computer Science, Chongqing University, Chongqing 400044, China
  • Received:1900-01-01 Revised:1900-01-01 Online:2012-01-11 Published:2012-01-11

一种改进的KNN文本分类

钟 将,刘荣辉   

  1. 重庆大学 计算机学院,重庆 400044

Abstract: In text categorization, the problems of large feature dimension and samples data distributed imbalanced influence the classified results. To this problem, this paper puts forward an improved KNN method. Using latent semantic analysis to reduce dimensionality of text feature matrix. Using improved KNN method based on density to realize text categorization. The experimental results show that the proposed method can effectively improve the text categorization precision.

Key words: feature reduction, latent semantic analysis, K-Nearest Neighbor(KNN), text categorization

摘要: 在文本分类中,文本特征空间维数巨大以及训练样本分布不均衡等问题影响分类性能。针对这个问题,提出一种改进的KNN分类方法。利用隐含语义分析方法对特征样本空间进行降维处理;利用基于样本密度的改进的KNN分类器进行分类。实验结果表明提出的方法能够收到较好的分类效果。

关键词: 特征降维, 潜在语义分析, K-最近邻法, 文本分类