计算机工程与应用 ›› 2024, Vol. 60 ›› Issue (1): 182-188.DOI: 10.3778/j.issn.1002-8331.2208-0295

• 模式识别与人工智能 • 上一篇    下一篇

提示学习驱动的新闻舆情风险识别方法研究

曾慧玲,李琳,吕思洋,何铮   

  1. 1.武汉理工大学 计算机与人工智能学院,武汉 430070
    2.武汉理工大学 经济学院,武汉 430070
    3.德勤咨询(上海)有限公司,上海 510623
  • 出版日期:2024-01-01 发布日期:2024-01-01

Risk Identification Method for News Public Opinion Driven by Prompt Learning

ZENG Huiling, LI Lin, LYU Siyang, HE Zheng   

  1. 1. School of Computer Science and Artificial Intelligence, Wuhan University of Technology, Wuhan 430070, China
    2. School of Economics, Wuhan University of Technology, Wuhan 430070, China
    3. Deloitte Consulting (Shanghai) Co. , Ltd. , Shanghai 510623, China
  • Online:2024-01-01 Published:2024-01-01

摘要: 从新闻报道中识别企业的风险可以快速定位企业所涉及的风险类别,从而帮助企业及时地做出应对措施。一般而言,新闻舆情风险识别是一种风险标签的多分类任务。以BERT为代表的深度学习方法采用预训练+微调的模式在文本分类任务当中表现突出。然而新闻舆情领域标记数据偏少,构成了小样本的机器学习问题。以提示学习为代表的新范式为小样本分类性能的提升提供了一种新的途径和手段,现有的研究表明该范式在很多任务上优于预训练+微调的方式。受现有研究工作的启发,提出了基于提示学习的新闻舆情风险识别方法,在BERT预训练模型基础之上根据提示学习的思想设计新闻舆情风险提示模板,通过MLM(masked language model)模型训练之后,将预测出来的标签通过答案工程映射到已有的风险标签。实验结果表明在新闻舆情数据集的不同数量小样本上,提示学习的训练方法均优于微调的训练方法。

关键词: 风险标签, 多分类, 预训练模型, 提示学习

Abstract: Identifying a company’s risks from news reports can quickly locate the risk categories involved in the company, so as to help enterprises to take response measures timely. Generally speaking, news public opinion risk identification is a multi-classification task of risk labels. The deep learning method represented by BERT uses the mode of pre-training + fine-tuning, which is prominent in text classification tasks. However, there is little labeled data in the field of news and public opinion, which constitutes a small-sample machine learning problem. The new paradigm represented by prompt learning provides a new way and means to improve the performance of small sample classification, and existing studies have shown that this paradigm is superior to the pre-training + fine-tuning method in many tasks. Inspired by the existing research work, this paper proposes a news public opinion risk identification method based on prompt learning, designs a news public opinion risk prompt template based on the idea of prompt learning on the basis of the BERT pre-training model, and after training by the MLM (masked language model) model, the predicted label is mapped to the existing risk label through answer engineering. The experimental results show that the training method of prompt learning is better than the training method of fine-tuning on different numbers of small samples of the news public opinion datasets.

Key words: risk label, multi-label classification, pretrained model, prompt learning