计算机工程与应用 ›› 2020, Vol. 56 ›› Issue (7): 205-209.DOI: 10.3778/j.issn.1002-8331.1812-0045

• 模式识别与人工智能 • 上一篇    下一篇

针对文本情感分类任务的textSE-ResNeXt集成模型

康雁,李浩,梁文韬,宁浩宇,霍雯   

  1. 云南大学 软件学院,昆明 650500
  • 出版日期:2020-04-01 发布日期:2020-03-28

textSE-ResNeXt Integration Model for Text Sentiment Classification Tasks

KANG Yan, LI Hao, LIANG Wentao, NING Haoyu, HUO Wen   

  1. School of Software, Yunnan University, Kunming 650500, China
  • Online:2020-04-01 Published:2020-03-28

摘要:

针对深度学习方法中文本表示形式单一,难以有效地利用语料之间细化的特征的缺陷,利用中英文语料的不同特性,有区别地对照抽取中英文语料的特征提出了一种新型的textSE-ResNeXt集成模型。通过PDTB语料库对语料的显式关系进行分析,从而截取语料主要情感部分,针对不同中、英文情感词典进行情感程度关系划分以此获得不同情感程度的子数据集。在textSE-ResNeXt神经网络模型中采用了动态卷积核策略,以此对文本数据特征进行更为有效的提取,模型中融合了SEnet和ResNeXt,有效地进行了深层次文本特征的抽取和分类。将不同情感程度的子集上对textSE-ResNeXt模型采用投票集成的方法进一步提高分类效率。分别在中文酒店评论语料和六类常见英文分类数据集上进行实验。实验结果表明了本模型的有效性。

关键词: 文本情感分类, textSE-ResNeXt, 特征划分, 集成模型

Abstract:

Aiming at the deep learning method that the text representation is single, and difficult to effectively use the defects of the refined features between the corpus. For the different characteristics between the Chinese and English corpora, a new type of textSE-ResNeXt integration model is proposed by distinguishing the characteristics of Chinese and English corpus. Through the PDTB corpus, the explicit relationship of the corpus is analyzed, so that the main emotional part of the corpus is intercepted. The emotional degree relationship is divided according to different Chinese and English sentiment lexicons, and sub-data sets with different levels of emotion are gotten. In the textSE-ResNeXt neural network model, the dynamic convolution kernel strategy is adopted to extract the text data features more effectively. The model incorporates SEnet and ResNeXt, which effectively extracts and classifies deep text features. The subset of different emotion levels is used to further improve the classification efficiency by adopting the voting integration method for the textSE-ResNeXt model. Experiments are conducted on Chinese hotel commentary corpus and six common English classification data sets. The experimental results show the effectiveness of the model.

Key words: text sentiment classification, textSE-ResNeXt, feature division, integrated model