计算机工程与应用 ›› 2011, Vol. 47 ›› Issue (18): 160-162.

• 数据库、信号与信息处理 • 上一篇    下一篇

文本褒贬倾向判别研究

李银花1,王素格2   

  1. 1.太原科技大学 应用科学学院,太原 030024
    2.山西大学 计算机与信息技术学院,太原 030006
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2011-06-21 发布日期:2011-06-21

Research on text commendatory-derogatory orientation discrimination

LI Yinhua1,WANG Suge2   

  1. 1.School of Applied Science,Taiyuan University of Science and Technology,Taiyuan 030024,China
    2.School of Computer and Information Technology,Shanxi University,Taiyuan 030006,China
  • Received:1900-01-01 Revised:1900-01-01 Online:2011-06-21 Published:2011-06-21

摘要: 在文本的向量空间表示模型下,针对文本褒贬倾向判别问题,提出了一种基于潜在语义分析的特征权重计算方法。除词频信息外,该方法考虑了潜在语义分析所提供的同义词、近义词信息对特征权重的影响。采用基于Fisher判别准则的特征选择方法,以支持向量机作为分类器,在2 739篇语料(2008年中文倾向性分析评测)上进行了实验。实验结果表明,提出的特征权重计算方法对文本褒贬倾向判别是有效的。

关键词: 文本褒贬倾向判别, 概率潜在语义分析, Fisher判别准则, 支持向量机

Abstract: On the basis of vector space model of text expression,a feature weight computing method for text commendatory-
derogatory orientation discrimination is proposed based on Probabilistic Latent Semantic Analysis(PLSA).In addition to the word frequency of a feature,the information of its thesaurus and homoionym latently obtained by PLSA is taken in consideration to weight computing.Using the feature selection method based on Fisher criterion,and constructing a classifier with Support Vector Machine(SVM),an experiment is conducted under a Chinese review text corpus with size of 2 739 documents(COAE2008).The experimental results indicate that the presented weight computing method based on PLSA is effective.

Key words: text commendatory-derogatory orientation discrimination, Probabilistic Latent Semantic Analysis(PLSA), Fisher discrimination criterion, Support Vector Machine(SVM)