Computer Engineering and Applications ›› 2022, Vol. 58 ›› Issue (14): 27-39.DOI: 10.3778/j.issn.1002-8331.2111-0580

• Research Hotspots and Reviews • Previous Articles     Next Articles

Review of Keyphrase Generation Based on Deep Learning

YU Qiang, LIN Min, LI Yanling   

  1. College of Computer Science and Technology, Inner Mongolia Normal University, Hohhot 010022, China
  • Online:2022-07-15 Published:2022-07-15

基于深度学习的关键词生成研究综述

于强,林民,李艳玲   

  1. 内蒙古师范大学 计算机科学技术学院,呼和浩特 010022

Abstract: Keyphrase generation is a classic but challenging task in natural language processing. It is necessary to automatically generate a set of representative and characteristic words from documents. The sequence-to-sequence model based on the deep learning has achieved remarkable results in this task, and it has made up for a serious shortcoming of keyphrase extraction in the past:it cannot generate keyphrase that do not exist in the original text. Because the results produced are more realistic, the keyphrase generation method has gradually surpassed the previous extraction methods and has become the mainstream method for keyphrase extraction tasks. This article first introduces the development process of keyphrase extraction and the main data sets of keyphrase generation tasks, and then classifies and sorts out the basic design of the keyphrase generation method using sequence-to-sequence model, and analyzes its principles, advantages and disadvantages. Finally, the evaluation method of the keyphrase generation task is summarized, and its future research focus is prospected.

Key words: keyphrase generation, deep neural network, Seq2Seq, attention mechanism

摘要: 关键词生成是自然语言处理中一项经典但具有挑战性的任务,需要从文档中自动生成一组具有代表性和特征性的词语。基于深度学习的序列到序列模型在这项任务中取得了显著的效果,弥补了以往关键词抽取存在的一个严重缺陷:无法产生不存在于原文中的关键词。由于其产生的结果更切合实际,关键词生成方法逐渐超越了以往的抽取方法,成为了关键词提取任务的主流方法。介绍了关键词提取的发展历程以及关键词生成任务的主要数据集,对基础设计采用序列到序列模型的关键词生成方法进行了分类梳理,分析其原理和优缺点。概述了关键词生成任务的评价方法,并对其未来研究重点进行了展望。

关键词: 关键词生成, 深度神经网络, Seq2Seq, 注意力机制