计算机工程与应用 ›› 2011, Vol. 47 ›› Issue (30): 116-118.

• 数据库、信号与信息处理 • 上一篇    下一篇

概念语义生成与文本特征选择研究

孙福振1,李贞双2   

  1. 1.山东理工大学 计算机科学与技术学院,山东 淄博 255049
    2.南阳师范学院 计算机与信息技术系,河南 南阳 473061
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2011-10-21 发布日期:2011-10-21

Research on concept semantic space and text feature selection

SUN Fuzhen1,LI Zhenshuang2   

  1. 1.College of Computer Science and Technology,Shandong University of Technology,Zibo,Shandong 255049,China
    2.Department of Computer and Information Technology,Nanyang Normal University,Nanyang,Henan 473061,China
  • Received:1900-01-01 Revised:1900-01-01 Online:2011-10-21 Published:2011-10-21

摘要: 文本特征选择是文本分类和信息提取的关键技术。针对文本分类中特征向量的高维稀疏问题,提出了非负矩阵分解和概念语义空间结合的特征抽取方法,对特征矩阵分解算法加入非负限制能够给出概念语义向量面向主题的解释,较好体现文本的局部特征。采用非负矩阵分解对全局和局部语义空间进行降维处理提高了体征提取效率,对不同概念语义空间中文本分类效果比对分析。实验结果表明基于非负矩阵分解的局部概念语义空间中文本分类精度较高。

关键词: 概念语义空间, 文本特征选择, 非负矩阵分解

Abstract: Text feature selection is a key technology of text classification and information extraction.For text classification with high dimensional sparse feature vector problem,a feature extraction method based on non-negative matrix factorization and concept semantic space is presented.This method gives the interpretation of the theme and better reflects the local characteristics of the text by adding the non-negative limitation to the matrix factorization.Experimental results show higher accuracy of the classification is achieved in local semantic space.

Key words: concept semantic space, text feature selection, non-negative matrix factorization