计算机工程与应用 ›› 2020, Vol. 56 ›› Issue (9): 142-147.DOI: 10.3778/j.issn.1002-8331.1901-0050

• 模式识别与人工智能 • 上一篇    下一篇

改进的CBOW情感信息获取研究

曹军博,叶霞,许飞翔,尹列东   

  1. 火箭军工程大学 作战保障学院,西安 710025
  • 出版日期:2020-05-01 发布日期:2020-04-29

Improved CBOW Emotional Information Acquisition Research

CAO Junbo,YE Xia,XU Feixiang,YIN Liedong   

  1. Academy of Combat Support, Rocket Force University of Engineering, Xi’an 710025, China
  • Online:2020-05-01 Published:2020-04-29

摘要:

大数据时代,文本的情感倾向对于文本潜在价值挖掘具有重要意义,然而人工方法很难有效挖掘网络上评论文本的潜在价值,随着计算机技术的快速发展,这一问题得到了有效解决。在文本情感分析中,获取词语的情感信息对于情感分析至关重要,词向量方法一般仅对词语的语法语义进行建模,但是忽略了词语的情感信息,无法更好地进行情感分析。通过TF-IDF算法模型获得赋权矩阵,构建停用词表,同时根据赋权矩阵生成Huffman树作为改进的CBOW算法的输入,引入情感词典生成情感标签辅助词向量生成,使词向量具有情感信息。实验结果表明,提出的方法对评论文本中获得的词向量能够较好地表达情感信息,情感分类结果优于传统模型。因此,该模型在评论文本情感分析中可以有效提升文本情感分类效果。

关键词: 词向量, CBOW模型, TF-IDF模型, 情感分析

Abstract:

In the era of big data, the emotional tendency of text is a great significance for the potential value of text mining. However, it is difficult for artificial methods to effectively exploit the potential value of comment text on the network. With the rapid development of computer technology, this problem has been effectively solved. In text sentiment analysis, acquiring emotional information of words is crucial for sentiment analysis. Word vector methods generally only model the grammatical semantics of words, but ignore the emotional information of words and cannot analyze emotions better. The weighting matrix is generated by TF-IDF algorithm model, the stop word list is constructed, and the Huffman tree is generated according to the weighting matrix as the input of the improved CBOW algorithm. The sentiment dictionary is introduced to generate the emotional label for assisting word vector generation, so that the word vector has emotional information. The experimental results show that the method can express the sentiment information well in the word vector obtained in the comment text, and the sentiment classification result is better than the traditional model. Therefore, the model can effectively improve the text sentiment classification effect in the emotional analysis of comment texts.

Key words: word vector, Continuous Bag-of-Word(CBOW) model, Term Frequency-Inverse Document Frequency(TF-IDF) model, sentiment analysis