计算机工程与应用 ›› 2022, Vol. 58 ›› Issue (24): 180-188.DOI: 10.3778/j.issn.1002-8331.2106-0227

• 模式识别与人工智能 • 上一篇    下一篇

非独立同分布文本情感表示学习方法

李倩,郭红钰,郑扬飞,刘玉龙,李山海,吴艳雄   

  1. 1.中国电子科技集团有限公司 第十五研究所,北京 100083
    2.中华全国工商业联合会 信息中心,北京 100035
  • 出版日期:2022-12-15 发布日期:2022-12-15

Sentiment Representation Learning for Non-IID Document

LI Qian, GUO Hongyu, ZHENG Yangfei, LIU Yulong, LI Shanhai, WU Yanxiong   

  1. 1.The 15th Research Institute, China Electronics Technology Group Corporation, Beijing 100083, China
    2.Information Centre, All-China Federation of Industry and Commerce, Beijing 100035, China
  • Online:2022-12-15 Published:2022-12-15

摘要: 非独立同分布文本的情感分析往往极具挑战,因其是一类包含词句间耦合关系和同词(句)多义性特点的复杂文本。现有方法中,几乎没有可以全面捕获非独立同分布文本特性的方法用于情感分析。面向情感分析的非独立同分布文本表示学习方法对文本中层次化存在的耦合关系和多义性问题进行建模,将这些决定着情感极性的非独立同分布特点嵌入到文本的向量表示中。非独立同分布文本表示学习方法通过一种带注意力机制的多尺度层次化深度神经网络实现。该神经网络利用多尺度卷积循环结构捕获文本中的耦合关系,利用注意力机制消除文本中的多义性。同时,该神经网络层次化地融合了由深度学习生成的隐式特征表示和由文本情感先验知识构造的显示特征表示,以防止数据过拟合问题并强化情感表示能力。充分的实验表明,非独立同分布文本表示学习方法可以显著增强文本情感分析的性能。

关键词: 非独立同分布文本, 文本数据表示, 情感分析, 深度学习

Abstract: Documents where words/sentences are coupled with each other and the heterogeneous meanings under different contexts are called non-independent and non-identical distributed(non-IID) document. Sentiment of a non-IID document is hard to be captured and represented by the existing methods. The non-IID document representation method for sentiment analysis models the coupling relations and the heterogeneous meaning which are hierarchically exist in a document;and embeds these characteristics in the vector representation. This method can be implemented by a multi-scale and hierarchical deep neural network with an attention mechanism. The network captures word/sentence couplings by a multi-scale convolutional-recurrent structure and reveals the heterogeneous meanings of words/sentences in a document by the attention mechanism. To avoid over-fitting and enhance sentiment-related information in the representation, the network further hierarchically integrates the network-learned implicit features with artificial explicit features, which are designed by sentiment priors. Extensive experiments demonstrate that the non-IID document representation method can enable significantly better sentiment analysis performance.

Key words: non-identical distributed(non-IID) document, textual data representation, sentiment analysis, deep learning