计算机工程与应用 ›› 2012, Vol. 48 ›› Issue (1): 166-169.

• 数据库、信号与信息处理 • 上一篇    下一篇

文本分类中基于概念映射的二次特征降维方法

熊忠阳,付玲玲,张玉芳   

  1. 重庆大学 计算机学院,重庆 400030
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2012-01-01 发布日期:2012-01-01

Mixed method of feature reduction based on concept mapping in text classification

XIONG Zhongyang, FU Lingling, ZHANG Yufang   

  1. College of Computer Science, Chongqing University, Chongqing 400030, China
  • Received:1900-01-01 Revised:1900-01-01 Online:2012-01-01 Published:2012-01-01

摘要: 对高维特征集的降维是文本分类的一个主要问题。在分析现有特征降维方法的基础上,借助《知网》提出一种新的二次降维方法:采用传统的特征选择方法提取一个候选特征集合;利用《知网》对候选集合中的特征项进行概念映射,把大量底层分散的原始特征项替换成少量的高层概念进行第二次特征降维。实验表明,这种方法可以在减少文本语义信息丢失的前提下,有效地降低特征空间维数,提升文本分类的准确度。

关键词: 文本分类, 特征降维, 特征选择, 概念映射, 《知网》

Abstract: Reducing the high dimension of feature vectors is an important issue in text classification. After studying current technique of feature reduction, a new method based on concept mapping is proposed. A subset of features is selected by traditional method of feature selection. Every feature in subset is mapped into the semantic dictionary and then selected again. The approach can not only get rid of redundant features but also preserve the semantic information of text. The results of experiments show that this method has improved effectively the precision of the text classification.

Key words: text classification, feature reduction, feature selection, concept mapping, HowNet