计算机工程与应用 ›› 2017, Vol. 53 ›› Issue (21): 77-84.DOI: 10.3778/j.issn.1002-8331.1702-0292

• 大数据与云计算 • 上一篇    下一篇

基于多文本特征融合的中文微博的立场检测

奠雨洁,金  琴,吴慧敏   

  1. 中国人民大学 信息学院,北京 100872
  • 出版日期:2017-11-01 发布日期:2017-11-15

Stance detection in Chinese microblogs via fusing multiple text features

DIAN Yujie, JIN Qin, WU Huimin   

  1. School of Information, Renmin University of China, Beijing 100872, China
  • Online:2017-11-01 Published:2017-11-15

摘要: 微博立场检测是判断微博作者对某一个话题的态度是支持、反对或中立。在基于监督学习的分类框架上,扩展并提出基于多文本特征融合的中文微博的立场检测方法。首先探究了基于词频统计的特征(词袋特征(Bag-of-Words,BoW)、基于同义词典的词袋特征、考虑词与立场标签共现关系的特征)和文本深度特征(词向量、字向量)。之后使用支持向量机,随机森林和梯度提升决策树对上述特征进行立场分类。最后,结合所有特征分类器进行后期融合。实验表明,文中提出的特征对于不同话题下的微博立场检测的结果都有提升,且文本深度特征和基于词频统计的特征能够捕捉到文本的不同信息,在立场检测中是互补的。基于本文方法的微博立场检测系统在2016年自然语言处理与中文计算会议(NLPCC2016)的中文微博立场检测评测任务中取得了最好的结果。

关键词: 立场检测, 情感分析, 文本特征表示, 微博, 文本分类

Abstract: Stance detection aims to automatically determine whether the author of a text is in favor of the given target, against the given target, or neither. This paper presents a stance detection system based on multiple text feature representations. Firstly, five different feature representations are explored including statistic-based features(BoW, synonym-based BoW, sVariance) and deep text features(word vectors and character vectors). Support Vector Machine(SVM), Random Forest and Gradient Boosting Decision Tree(GBDT) are applied as classifiers. Finally, late fusion is conducted to combine different feature representations. Experiment results show that the proposed feature representations can achieve significant improvement over traditional BoW feature. Moreover, statistic-based features and deep features provide complementary information for stance detection, which leads to the wining system in the Chinese Microblog Stance Detection Evaluation by Natural Language Processing and Chinese Computing(NLPCC 2016).

Key words: stance detection, sentiment analysis, text feature representations, Chinese Microblogs, text classification