计算机工程与应用 ›› 2015, Vol. 51 ›› Issue (21): 30-35.

• 理论研究、研发设计 • 上一篇    下一篇

用于文本情感分析的特征加权改进算法

郑安怡   

  1. 复旦大学 计算机科学技术学院,上海市智能信息处理重点实验室(复旦大学),上海 200433
  • 出版日期:2015-11-01 发布日期:2015-11-16

Improved term weighting algorithm for text sentiment analysis

ZHENG Anyi   

  1. Shanghai Key Laboratory of Intelligent Information Processing, School of Computer Science, Fudan University, Shanghai 200433, China
  • Online:2015-11-01 Published:2015-11-16

摘要: 文本情感分析领域内的特征加权一般考虑两个影响因子:特征在文档中的重要性(ITD)和特征在表达情感上的重要性(ITS)。结合该领域内两种分类准确率较高的监督特征加权算法,提出了一种新的ITS算法。新算法同时考虑特征在一类文档集里的文档频率(在特定的文档集里,出现某个特征的文档数量)及其占总文档频率的比例,使主要出现且大量出现在同一类文档集里的特征获得更高的ITS权值。实验证明,新算法能提高文本情感分类的准确率。

关键词: 文本情感分析, 特征加权, 文档频率, 情感分类

Abstract: There are two universal factors in term weighting for sentiment analysis:Importance of a Term in a Document(ITD)and Importance of a Term for expressing Sentiment(ITS). An improved ITS algorithm is proposed by combining two state-of-the-art supervised term weighting schemes which have high classification accuracy. The improved algorithm takes both document frequency(the number of documents in which a term occurs)of specific feature and its proportion in the whole document frequency into account. Thus, features which occur predominantly in many documents of one class can be given relatively higher ITS weights. Experiment results show that the proposed algorithm can improve the performance of sentiment classification.

Key words: sentiment analysis, term weighting, document frequency, sentiment classification