计算机工程与应用 ›› 2012, Vol. 48 ›› Issue (2): 142-144.

• 数据库、信号与信息处理 • 上一篇    下一篇

一种改进的KNN文本分类

钟 将,刘荣辉   

  1. 重庆大学 计算机学院,重庆 400044
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2012-01-11 发布日期:2012-01-11

Improved KNN text categorization

ZHONG Jiang, LIU Ronghui   

  1. College of Computer Science, Chongqing University, Chongqing 400044, China
  • Received:1900-01-01 Revised:1900-01-01 Online:2012-01-11 Published:2012-01-11

摘要: 在文本分类中,文本特征空间维数巨大以及训练样本分布不均衡等问题影响分类性能。针对这个问题,提出一种改进的KNN分类方法。利用隐含语义分析方法对特征样本空间进行降维处理;利用基于样本密度的改进的KNN分类器进行分类。实验结果表明提出的方法能够收到较好的分类效果。

关键词: 特征降维, 潜在语义分析, K-最近邻法, 文本分类

Abstract: In text categorization, the problems of large feature dimension and samples data distributed imbalanced influence the classified results. To this problem, this paper puts forward an improved KNN method. Using latent semantic analysis to reduce dimensionality of text feature matrix. Using improved KNN method based on density to realize text categorization. The experimental results show that the proposed method can effectively improve the text categorization precision.

Key words: feature reduction, latent semantic analysis, K-Nearest Neighbor(KNN), text categorization