Computer Engineering and Applications ›› 2015, Vol. 51 ›› Issue (2): 131-135.

Previous Articles     Next Articles

Sample cutting and weighting method in text classification based on position

LIU Haifeng, LIU Shousheng, SU Zhan   

  1. Institute of Sciences, PLA University of Science and Technology, Nanjing 210007, China
  • Online:2015-01-15 Published:2015-01-12

基于位置的文本分类样本剪裁及加权方法

刘海峰,刘守生,苏  展   

  1. 解放军理工大学 理学院,南京 210007

Abstract: K nearest neighbor method is widely used in text classification method. There is the real need of improving the algorithm performance. It uses an improved clustering algorithm for sample cut to improve training sample category representation capability. According to the spatial location of the sample, it realizes the sample weighting based on class inner and class between. It improves the phenomenon that categories, high density of training samples have the advantage in k nearest neighbor algorithm. The experimental result shows that the improved text weighted method improves the classification efficiency.

Key words: sample cutting, sample weighting, text clustering, k-nearest neighbor, text categorization

摘要: k近邻方法是文本分类中广泛应用的方法,对其性能的优化具有现实需求。使用一种改进的聚类算法进行样本剪裁以提高训练样本的类别表示能力;根据样本的空间位置先后实现了基于类内和类间分布的样本加权;改善了k近邻算法中的大类别、高密度训练样本占优现象。实验结果表明,提出的改进文本加权方法提高了分类器的分类效率。

关键词: 样本剪裁, 样本加权, 文本聚类, k近邻, 文本分类