基于ALBERT-UniLM模型的文本自动摘要技术研究

doi:10.3778/j.issn.1002-8331.2104-0382

摘要/Abstract

摘要： 针对文本摘要任务中的生成式摘要模型对原文理解不充分且容易生成重复文本等问题，提出将词向量模型ALBERT与统一预训练模型UniLM相结合的算法，构造出一种ALBERT-UniLM摘要生成模型。该模型采用预训练动态词向量ALBERT替代传统的BERT基准模型进行特征提取获得词向量。利用融合指针网络的UniLM语言模型对下游生成任务微调，结合覆盖机制来降低重复词的生成并获取摘要文本。实验以ROUGE评测值作为评价指标，在2018年CCF国际自然语言处理与中文计算会议（NLPC-C2018）单文档中文新闻摘要评价数据集上进行验证。与BERT基准模型相比，ALBERT-UniLM模型的Rouge-1、Rouge-2和Rouge-L指标分别提升了1.57%、1.37%和1.60%。实验结果表明，提出的ALBERT-UniLM模型在文本摘要任务上效果明显优于其他基准模型，能够有效提高文本摘要的生成质量。

关键词: 自然语言处理, 预训练语言模型, ALBERT模型, UniLM模型, 生成式摘要

Abstract: Aiming at the problem that the generative summary model in the text summarization task does not fully understand the original text and is easy to generate repeated texts, an algorithm combining the dynamic word vector model ALBERT and the unified pre-training model UniLM is proposed to construct an ALBERT-UniLM summary generate the model. The model first uses the pre-trained dynamic word vector ALBERT to replace the traditional BERT benchmark model for feature extraction to obtain the word vector. Then the UniLM language model of the fusion pointer network is used to fine-tune the downstream generation tasks, and the coverage mechanism is combined to reduce the generation of repetitive content and obtain the summary text. The experimental result uses the ROUGE evaluation value as the evaluation indicator. It is verified on the 2018 CCF International Natural Language Processing and Chinese Computing Conference（NLPCC2018） single-document Chinese news summary evaluation data set. Compared with the BERT benchmark model, the Rouge of the ALBERT-UniLM model Rouge-1, Rouge-2 and Rouge-L indicators increased by 1.57%, 1.37% and 1.60% respectively. Experimental results show that the ALBERT-UniLM model proposed in the article is significantly better than other benchmark models on text summarization tasks, and can effectively improve the quality of text summarization generation.

Key words: natural language processing, pre-trained language model, ALBERT model, UniLM model, abstractive summarization

孙宝山, 谭浩. 基于ALBERT-UniLM模型的文本自动摘要技术研究[J]. 计算机工程与应用, 2022, 58(15): 184-190.

SUN Baoshan, TAN Hao. Automatic Text Summarization Technology Based on ALBERT-UniLM Model[J]. Computer Engineering and Applications, 2022, 58(15): 184-190.

参考文献

[1] RUSH A M，CHOPRA S，WESTON J.A neural attention model for abstractive sentence summarization[C]//Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing，2015：379-389.
[2] NALLAPATI R，ZHOU B，DOS SANTOS C，et al.Abstractive text summarization using sequence-to-sequence RNNs and beyond[C]//Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning，2016：280-290.
[3] SEE A，LIU P J，MANNING C D.Get to the point：sum- marization with pointer-generator networks[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics（Volume 1：Long Papers），2017：1073-1083.
[4] TAN J，WAN X，XIAO J.Abstractive document summrization with a graph-based attentional neural model[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics（Volume 1：Long Papers），2017：1171-1181.
[5] LIN J，SUN X，MA S，et al.Global encoding for abs- tractive summarization[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics（Volume 2：Short Papers），2018：163-169.
[6] REN P，CHEN Z，REN Z，et al.Leveraging contextual sentence relations for extractive summarization using a neural attention model[C]//Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval，2017：95-104.
[7] YU L，ZHANG W，WANG J，et al.Sequence generative adversarial nets with policy gradient[C]//AAAI Conference on Artificial Intelligence，2017.
[8] DEVLIN J，CHANG M W，LEE K，et al.BERT：pre- training of deep bidirectional transformers for language understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics：Human Language Technologies（Volume 1：Long and Short Papers），2019：4171-4186.
[9] LIU G，GUO J.Bidirectional LSTM with attention mechanism and convolutional layer for text classification[J].Neuro-Computing，2019，337：325-338.
[10] LAN Z，CHEN M，GOODMAN S，et al.ALBERT：a lite BERT for self-supervised learning of language representations[C]//International Conference on Learning Representations，2019.
[11] VASWANI A，SHAZEER N，PARMAR N，et al.Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems，2017：6000-6010.
[12] ZHOU Q，YANG N，WEI F，et al.Neural document sum- marization by jointly learning to score and select sentences[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics（Volume 1：Long Papers），2018：654-663.
[13] WANG H，WANG X，XIONG W，et al.Self-supervised learning for contextualized extractive summarization[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics，2019：2221-2227.
[14] LIU Y.Fine-tune BERT for extractive summarization[J].arXiv：1903.10318，2019.
[15] WANG D，LIU P，ZHENG Y，et al.Heterogeneous graph neural networks for extractive document summarization[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics，2020：6209-6219.
[16] CHI P H，CHUNG P H，WU T H，et al.Audio albert：a lite bert for self-supervised learning of audio representation[C]//2021 IEEE Spoken Language Technology Workshop（SLT），2021：344-350.
[17] DONG L，YANG N，WANG W，et al.Unified language model pre-training for natural language understanding and generation[J].arXiv：1905.03197，2019.
[18] GEHRMANN S，DENG Y，RUSH A M.Bottom-up abstractive summarization[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing，2018：4098-4109.
[19] PETERS M，NEUMANN M，IYYER M，et al.Deep contextualized word representations[C]//Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics：Human Language Technologies（Volume 1：Long Papers），2018：2227-2237.
[20] TAM Y C.Cluster-based beam search for pointer-generator chatbot grounded by knowledge[J].Computer Speech& Language，2020，64：101094.
[21] KAHNG M，ANDREWS P Y，KALRO A，et al.ActiVis：visual exploration of industry-scale deep neural network models[J].IEEE Transactions on Visualization and Computer Graphics，2017，24（1）：88-97.
[22] LI W，XIAO X，LYU Y，et al.Improving neural abstractive document summarization with explicit information selection modeling[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing，2018：1787-1796.
[23] REDDY S，CHEN D，MANNING C D.Coqa：a conversational question answering challenge[J].Transactions of the Association for Computational Linguistics，2019，7：249-266.
[24] LI X，CHEN S，HU X，et al.Understanding the disharmony between dropout and batch normalization by variance shift[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition，2019：2682-2690.
[25] LIN C Y.Rouge：a package for automatic evaluation of summaries[C]//Proceedings of the Workshop on Text Summarization Branches Out，2004：74-81.