计算机工程与应用 ›› 2022, Vol. 58 ›› Issue (20): 165-173.DOI: 10.3778/j.issn.1002-8331.2103-0328

• 模式识别与人工智能 • 上一篇    下一篇

主题感知的长文本自动摘要算法

杨涛,解庆,刘永坚,刘平峰   

  1. 1.武汉理工大学 计算机科学与技术学院,武汉 430070
    2.武汉理工大学 经济学院,武汉 430070
  • 出版日期:2022-10-15 发布日期:2022-10-15

Research on Topic-Aware Long Text Summarization Algorithm

YANG Tao, XIE Qing, LIU Yongjian, LIU Pingfeng   

  1. 1.School of Computer Science and Technology, Wuhan University of Technology, Wuhan 430070, China
    2.School of Economics, Wuhan University of Technology, Wuhan 430070, China
  • Online:2022-10-15 Published:2022-10-15

摘要: 长文本摘要生成一直是自动摘要领域的难题。现有方法在处理长文本的过程中,存在准确率低、冗余等问题。鉴于主题模型在多文档摘要中的突出表现,将其引入到长文本摘要任务中。另外,目前单一的抽取式或生成式方法都无法应对长文本的复杂情况。结合两种摘要方法,提出了一种针对长文本的基于主题感知的抽取式与生成式结合的混合摘要模型。并在TTNews和CNN/Daily Mail数据集上验证了模型的有效性,该模型生成摘要ROUGE分数与同类型模型相比提升了1~2个百分点,生成了可读性更高的摘要。

关键词: 主题模型, 长文本摘要, 混合模型, 指针网络

Abstract: Summarization generation of long text is always a difficult problem in the field of automatic summarization. The existing methods have some problems such as low accuracy and redundancy in the process of processing long text. In view of the outstanding performance of the topic model in multi-document summarization, it is introduced into the long text summarization task. In addition, the current single extractive or abstractive method can not deal with the complex situation of long text. It proposes a hybrid summarization model for long text based on topic aware, which combines extractive and abstractive methods. The validity of the model is verified on TTNews and CNN/Daily Mail datasets. The ROUGE score of the model is 1 to 2 percentage points higher than that of the same type of model, resulting in a more readable summary.

Key words: topic model, long text summarization, hybrid model, pointer network