计算机工程与应用 ›› 2011, Vol. 47 ›› Issue (28): 139-142.

• 数据库、信号与信息处理 • 上一篇    下一篇

基于概念特征的语义文本分类

林 伟,孟凡荣,王志晓   

  1. 中国矿业大学 计算机科学与技术学院,江苏 徐州 221008
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2011-10-01 发布日期:2011-10-01

Concept-features-based semantic text classification

LIN Wei,MENG Fanrong,WANG Zhixiao   

  1. School of Computer Science and Technology,China University of Mining and Technology,Xuzhou,Jiangsu 221008,China
  • Received:1900-01-01 Revised:1900-01-01 Online:2011-10-01 Published:2011-10-01

摘要: 文本分类是组织和处理海量文本信息的关键方法。目前的文本分类模型多用关键词特征向量描述文本资源,造成向量的高维性和稀疏性。引入文本资源的概念特征,将文本资源描述由关键词级提升至概念级,提高文本资源描述的准确性,并提出了基于概念特征的语义文本分类模型。仿真实验的结果表明,该模型能有效克服资源特征向量空间的高维性和稀疏性,确保向量空间的正交性,在语义文本分类的效率和正确性上都有良好的表现。

关键词: 语义文本分类, 概念特征, 本体, 支持向量机

Abstract: Text classification is the key method of mass text messages organizing and processing.Most current text classification models use keywords eigenvector to describe the text resources which makes the vector high dimensional and sparse.This paper introduces the concept-features instead of keywords to describe the text resources,which improves the accuracy of text resources description.A concept-features-based semantic text classification model is put forward in this paper.The results of simulation experiments show that the model can overcome the vector space’s high-dimensionality and sparsity,and ensure the orthogonality of the vector space.The semantic text classification presents a fine performance both in efficiency and accuracy.

Key words: semantic text classification, concept-features, ontology, Support Vector Machine(SVM)