计算机工程与应用 ›› 2017, Vol. 53 ›› Issue (22): 121-125.DOI: 10.3778/j.issn.1002-8331.1605-0342

• 模式识别与人工智能 • 上一篇    下一篇

情感分类中基于词性嵌入的特征权重计算方法

于海燕,陆慧娟,郑文斌   

  1. 中国计量大学 信息工程学院,杭州 310018
  • 出版日期:2017-11-15 发布日期:2017-11-29

Feature weighting method based on part of speech embedding for sentiment classification

YU Haiyan, LU Huijuan, ZHENG Wenbin   

  1. College of Information Engineering, China Jiliang University, Hangzhou 310018, China
  • Online:2017-11-15 Published:2017-11-29

摘要: 在文本情感分类中,传统的特征表达通常忽略了语言知识的重要性。提出了一种基于词性嵌入的特征权重计算方法,通过构造一种特征嵌入模式将名词、动词、形容词、副词四种词性对情感分类的贡献度嵌入到传统的TF-IDF(Term Frequency-Inverse Document Frequency)权值中。其中,词性的情感贡献度通过粒子群优化算法获得。实验采用支持向量机完成分类,并对比了不同知识的嵌入情况,包括词性、情感词及词性和情感词的组合。结果表明基于词性嵌入的方法分类性能最优,可以显著提高中文文本情感分类的准确率。

关键词: 词性嵌入, 特征权重, 情感分类, 粒子群优化

Abstract: The importance of language knowledge is always neglected in traditional feature representation for text sentiment classification. This paper proposes a novel feature weighting approach based on part of speech embedding, in which a feature embedding schema is constructed such that the contribution of noun, verb, adjective and adverb can be embedded into the traditional TF-IDF(Term Frequency-Inverse Document Frequency) weighting, where the best contribution value is obtained by particle swarm optimization algorithm. The support vector machine classifier is used for the Chinese text sentiment classification. In the experiment, the performance of different knowledge is also compared, such as part of speech, sentiment words and their combination. The experimental results show that the proposed method achieves the best classification performance.

Key words: part of speech embedding, feature weighting, sentiment classification, particle swarm optimization