计算机工程与应用 ›› 2024, Vol. 60 ›› Issue (20): 153-159.DOI: 10.3778/j.issn.1002-8331.2307-0015

• 模式识别与人工智能 • 上一篇    下一篇

融合BERT和双向长短时记忆网络的中文反讽识别研究

王旭阳,戚楠,魏申酉   

  1. 兰州理工大学 计算机与通信学院,兰州 730050
  • 出版日期:2024-10-15 发布日期:2024-10-15

Research on Chinese Irony Recognition by Integrating BERT and Bidirectional Long Short-Term Memory Networks

WANG Xuyang, QI Nan, WEI Shenyou   

  1. School of Computer and Communication, Lanzhou University of Technology, Lanzhou 730050, China
  • Online:2024-10-15 Published:2024-10-15

摘要: 用户对微博热点话题进行评论时会使用反语、讽刺的修辞手法,其本身带有一定的情感倾向会对情感分析结果造成一定影响。因此该文主要针对中文微博评论进行反讽识别,构建了一个包含反语、讽刺和非反讽的三分类数据集,提出一个基于BERT和双向长短时记忆网络(BiLSTM)的模型BERT_BiLSTM。该模型通过BERT生成含有上下文信息的动态字向量,输入BiLSTM提取文本的深层反讽特征,在全连接层传入softmax对文本进行反讽识别。实验结果表示,在二分类和三分类数据集上,提出的BERT_BiLSTM模型与现有主流模型相比准确率和F1值均有明显提高。

关键词: 反讽识别, BERT, 特征提取, 双向长短时记忆网络(BiLSTM)

Abstract: Users comment on hot topics on microblog using rhetorical techniques such as irony and sarcasm. Irony and sarcasm themselves carry a certain emotional tendency, which has a certain tendency to affect the sentiment analysis results. Therefore, this paper focuses on irony recognition of Chinese microblog comments, constructs a tri-classified dataset containing ironic, sarcasm and non-irony, and proposes a model BERT_BiLSTM based on bidirectional encoder representations from Transformers (BERT) and bidirectional long short-term memory network (BiLSTM). The model generates dynamic word vectors containing contextual information through BERT, inputs BiLSTM to extract the deep ironic features of the text, and passes in softmax at the fully connected layer for ironic recognition of the text. The experimental results indicate that the BERT_BiLSTM model proposed in this paper has significantly improved the accuracy and F1 values compared with the existing mainstream models on both binary and triple classification datasets.

Key words: irony recognition, bidirectional encoder representations from Transformers (BERT), feature extraction, bidirectional long short-term memory network (BiLSTM)