计算机工程与应用 ›› 2018, Vol. 54 ›› Issue (13): 27-33.DOI: 10.3778/j.issn.1002-8331.1804-0208

• 热点与综述 • 上一篇    下一篇

基于弱监督预训练CNN模型的情感分析方法

张  越1,2, 夏鸿斌1,2   

  1. 1.江南大学 数字媒体学院,江苏 无锡 214122
    2.江苏省媒体设计与软件技术重点实验室,江苏 无锡 214122
  • 出版日期:2018-07-01 发布日期:2018-07-17

Sentiment analysis method based on pre-training Convolutional Neural Networks by distant supervision

ZHANG Yue1,2, XIA Hongbin1,2   

  1. 1.School of Digital Media, Jiangnan University, Wuxi, Jiangsu 214122, China
    2.Jiangsu Key Laboratory of Media Design and Software Technology, Wuxi, Jiangsu 214122, China
  • Online:2018-07-01 Published:2018-07-17

摘要: 传统的情感分析研究大多基于机器学习算法,此类方法依赖大量人工抽取的特征与领域知识。使用卷积神经网络自动学习文本的特征表示,进而判别文本的情感极性。为了解决情感分析中监督训练样本不足的问题,利用大规模弱监督数据来训练卷积神经网络。同时引入“预训练-微调”策略,先在弱监督数据集上对卷积神经网络进行预训练,然后使用监督数据集进行微调训练来克服弱监督数据中的噪声问题。在SemEval-2013 Twitter情感分析数据集上进行实验验证,结果表明由于引入了弱监督数据参与训练,有效增强了卷积神经网络学习情感语义的能力,从而提升了模型的准确性。

关键词: 情感分析, 弱监督, 预训练-微调, 卷积神经网络

Abstract: Traditional researches of sentiment analysis are mostly based on machine learning algorithm, which rely on a huge number of artificially extracted features and domain knowledge. Convolution neural network is used to automatically learn the characteristics of texts and then identify the sentiment polarity of them. In order to solve the problem of insufficient supervision training dataset in sentiment analysis, the large-scale distant supervision data the used to train convolution neural network. At the same time, the “pre-train-fine-tune” strategy is used to overcome the noises in the distant supervision data, by pre-training convolution neural network on the distant supervision data and then fine-tuning it on the supervision dataset. Experimental results on the SemEval-2013 Twitter sentiment analysis dataset show that the ability of convolutional neural network to learn emotion semantics is enhanced effectively by using distant supervision data to participate in the training.

Key words: sentiment analysis, distant supervision, pre-train-fine-tune, Convolutional Neural Networks(CNN)