Computer Engineering and Applications ›› 2021, Vol. 57 ›› Issue (15): 200-206.DOI: 10.3778/j.issn.1002-8331.2009-0085
Previous Articles Next Articles
HE Junmin, LU Menghua, MENG Kui
Online:
Published:
赫俊民,鲁梦华,孟魁
Abstract:
Text summarization technology filters out important information from the text and presents it reasonably, which can help people quickly obtain information. In the field of Chinese single-document summarization, the supervised summarization model is not mature due to the lack of reliable data sets. A Chinese document-level summary corpus—CDESD(Chinese Document-level Extractive Summarization Dataset) with a scale of more than 200,000 articles is constructed, and a supervised document-level extractive summary model—DSum-SSE(Document Summarization with SPA Sentence Embedding) is proposed. The model is based on a neural network framework, and uses a sequence-to-sequence framework that combines Pointer and attention mechanisms to solve sentence-level generative summarization problems to obtain a representation vector that reflects the core meaning of the sentence, and introduce extremes on this basis Pointer mechanism, complete the supervised document-level extractive summary algorithm. Experiments show that compared with the popular unsupervised document-level extractive summary algorithm—TextRank, DSum-SSE is capable of providing higher-quality summaries. The corpus CDESD and the model DSum-SSE complement well in the field of Chinese document level summaries.
Key words: document-level summarization, extractive summary, sequence-to-sequence, attention mechanism, Pointer
摘要:
针对中文文档摘要领域存在的缺少可靠数据集,有监督的摘要模型不成熟的问题,构建了一个规模超过20万篇的中文文档级别的摘要语料库(Chinese Document-level Extractive Summarization Dataset,CDESD),提出了一种有监督的文档级别抽取式摘要模型(Document Summarization with SPA Sentence Embedding,DSum-SSE)。该模型以神经网络为基础的框架,使用结合了Pointer和注意力机制的端到端框架解决句子级别的生成式摘要问题,以获得反映句子核心含义的表示向量,然后在此基础上引入极端的Pointer机制,完成文档级别抽取式摘要算法。实验表明,相比于无监督的单文档摘要算法——TextRank,DSum-SSE有能力提供更高质量的摘要。CDESD和DSum-SSE分别对中文文档级别摘要领域的语料数据和模型做了很好的补充。
关键词: 文档级文本摘要, 抽取式摘要, 端到端框架, 注意力机制, Pointer
HE Junmin, LU Menghua, MENG Kui. Chinese Document-Level Summary Model — DSum-SSE[J]. Computer Engineering and Applications, 2021, 57(15): 200-206.
赫俊民,鲁梦华,孟魁. 中文单文档摘要模型DSum-SSE[J]. 计算机工程与应用, 2021, 57(15): 200-206.
0 / Recommend
Add to citation manager EndNote|Ris|BibTeX
URL: http://cea.ceaj.org/EN/10.3778/j.issn.1002-8331.2009-0085
http://cea.ceaj.org/EN/Y2021/V57/I15/200