Risk Identification Method for News Public Opinion Driven by Prompt Learning

doi:10.3778/j.issn.1002-8331.2208-0295

Abstract

Abstract: Identifying a company’s risks from news reports can quickly locate the risk categories involved in the company, so as to help enterprises to take response measures timely. Generally speaking, news public opinion risk identification is a multi-classification task of risk labels. The deep learning method represented by BERT uses the mode of pre-training + fine-tuning, which is prominent in text classification tasks. However, there is little labeled data in the field of news and public opinion, which constitutes a small-sample machine learning problem. The new paradigm represented by prompt learning provides a new way and means to improve the performance of small sample classification, and existing studies have shown that this paradigm is superior to the pre-training + fine-tuning method in many tasks. Inspired by the existing research work, this paper proposes a news public opinion risk identification method based on prompt learning, designs a news public opinion risk prompt template based on the idea of prompt learning on the basis of the BERT pre-training model, and after training by the MLM (masked language model) model, the predicted label is mapped to the existing risk label through answer engineering. The experimental results show that the training method of prompt learning is better than the training method of fine-tuning on different numbers of small samples of the news public opinion datasets.

Key words: risk label, multi-label classification, pretrained model, prompt learning

摘要： 从新闻报道中识别企业的风险可以快速定位企业所涉及的风险类别，从而帮助企业及时地做出应对措施。一般而言，新闻舆情风险识别是一种风险标签的多分类任务。以BERT为代表的深度学习方法采用预训练+微调的模式在文本分类任务当中表现突出。然而新闻舆情领域标记数据偏少，构成了小样本的机器学习问题。以提示学习为代表的新范式为小样本分类性能的提升提供了一种新的途径和手段，现有的研究表明该范式在很多任务上优于预训练+微调的方式。受现有研究工作的启发，提出了基于提示学习的新闻舆情风险识别方法，在BERT预训练模型基础之上根据提示学习的思想设计新闻舆情风险提示模板，通过MLM（masked language model）模型训练之后，将预测出来的标签通过答案工程映射到已有的风险标签。实验结果表明在新闻舆情数据集的不同数量小样本上，提示学习的训练方法均优于微调的训练方法。

关键词: 风险标签, 多分类, 预训练模型, 提示学习

ZENG Huiling, LI Lin, LYU Siyang, HE Zheng. Risk Identification Method for News Public Opinion Driven by Prompt Learning[J]. Computer Engineering and Applications, 2024, 60(1): 182-188.

曾慧玲, 李琳, 吕思洋, 何铮. 提示学习驱动的新闻舆情风险识别方法研究[J]. 计算机工程与应用, 2024, 60(1): 182-188.

References

[1] 于朝晖. CNNIC发布第44次《中国互联网络发展状况统计报告》[J]. 网信军民融合, 2019(9): 30-31.
YU C H. The 44th Statistical report on the development of China’s Internet network published by CNNIC[J]. Civil-Military Integration on Cyberspace, 2019(9): 30-31.
[2] 张宇豪. 基于BERT的新闻短文本分类方法研究[D]. 西安: 西安科技大学, 2021.
ZHANG Y H. Research on news short text classification method based on BERT[D]. Xi’an: Xi’an University of Science and Technology, 2021.
[3] 李心雨. 细粒度的新闻文本分类方法[D]. 哈尔滨: 哈尔滨工业大学, 2020.
LI X Y. Fine-grained news text classification method[D]. Harbin: Harbin Institute of Technology, 2020.
[4] 杨杰, 杨文军. 基于BERT模型的文本评论情感分析[J]. 天津理工大学学报, 2021, 37(2): 12-16.
YANG J, YANG W J. Text comment sentiment analysis based on BERT model[J]. Journal of Tianjin University of Technology, 2021, 37(2): 12-16.
[5] 于尤婧. 面向可解释性双向编码语言模型的文本分类研究[D]. 长春: 吉林大学, 2020.
YU Y J. Research on text classification for explainability bidirectional transformers language model[D]. Changchun: Jilin University, 2020.
[6] BROWN T, MANN B, RYDER N, et al. Language models are few-shot learners[C]//Advances in Neural Information Processing Systems, 2020: 1877-1901.
[7] SCHICK T, SCHüTZE H. Exploiting cloze questions for few shot text classification and natural language inference[J]. arXiv:2001.07676, 2020.
[8] LIU P, YUAN W, FU J, et al. Pre-train, prompt, and predict: a systematic survey of prompting methods in natural language processing[J]. arXiv:2107.13586, 2021.
[9] 罗贤昌, 薛吟兴. 基于BERT的提示学习实现软件需求精确分类[J]. 信息技术与网络安全, 2022, 41(2): 39-45.
LUO X C, XUE Y X. Accurately classify software requirements using prompt learning on BERT[J]. Information Technology and Network Security, 2022, 41(2): 39-45.
[10] 范昊, 何灏. 融合上下文特征和BERT词嵌入的新闻标题分类研究[J]. 情报科学, 2022, 40(6): 90-97.
FAN H, HE H. News title classification based on contextual features and BERT word embedding[J]. Information Science, 2022, 40(6): 90-97.
[11] 张海丰, 曾诚, 潘列, 等. 结合BERT和特征投影网络的新闻主题文本分类方法[J]. 计算机应用, 2022, 42(4): 1116-1124.
ZHANG H F, ZENG C, PAN L, et al. News topic text classification method combining BERT and feature projection network[J]. Computer Applications, 2022, 42(4): 1116-1124.
[12] 杨文浩, 刘广聪, 罗可劲. 基于BERT和深层等长卷积的新闻标签分类[J]. 计算机与现代化, 2021(8): 94-99.
YANG W H, LIU G C, LUO K J. News label classification based on BERT and deep equal length convolution[J]. Computer and Modernization, 2021(8): 94-99.
[13] SCHICK T, SCHüTZE H. It’s not just size that matters: small language models are also few-shot learners[J]. arXiv:2009.07118, 2020.
[14] GAO T, FISCH A, CHEN D. Making pre-trained language models better few-shot learners[J]. arXiv:2012.15723, 2020.
[15] DING N, CHEN Y, HAN X, et al. Prompt-learning for fine-grained entity typing[J]. arXiv:2108.10604, 2021.
[16] GU Y, HAN X, LIU Z, et al. PPT: pre-trained prompt tuning for few-shot learning[J]. arXiv:2109.04332, 2021.