English Caption Generation Model Fused with Attention Mechanism of Spatial Position

doi:10.3778/j.issn.1002-8331.2111-0300

Abstract

Abstract: The transformer chart to text（TransChartText） model is proposed based on Transformer architecture in order to make the English caption generation model generate fluent, coherent and informative specific information annotations. By screening various scientific research papers and news article websites, a chart-based annotation description data set is made, which covers a wealth of data categories and logical reasoning. Data variables are introduced to replace the data values of the graph, which effectively improves the selection of the content of the captions generated by the model and promoted the model to generate coherent captions. In order to enhance the ability of learning the position relation between words and reduce the frequency of wrong word order, the spatial position embedding coding and cluster search algorithm are introduced into the encoder and decoder respectively. Experimental results show that TransChartText model achieves better scores on content selection（CS）, content sequencing（CO）, ROUGE and BLEU, and generates high-quality chart-based English captions.

Key words: language model, generative caption, Transformer, attention mechanism, beam search

摘要： 为使题注生成模型生成流畅、连贯和信息丰富的特定信息题注，在Transformer架构的基础上提出了Transformer Chart to Text（TransChartText）模型。通过筛选各种科研论文和新闻文章网站，制作了基于图表的题注描述数据集，该数据集的英语题注描述涵盖了丰富的数据类别和逻辑推理。引入数据变量替换图表数据值，有效提高了模型生成题注的内容选择，促使模型生成了连贯的题注内容。为进一步增强模型学习词与词之间位置关系的能力并降低错误词序频率，模型分别对编码器和解码器引入空间位置嵌入编码和集束搜索算法。实验结果表明，TransChartText模型在内容选择（CS）、内容排序（CO）、ROUGE、BLEU指标上取得了更好的分数，生成了高质量的基于图表的英语题注。

关键词: 语言模型, 生成式题注, Transformer, 注意力机制, 集束搜索

WANG Qin, WANG Xin, YAN Jingke, ZHONG Meiling, ZENG Jing. English Caption Generation Model Fused with Attention Mechanism of Spatial Position[J]. Computer Engineering and Applications, 2022, 58(12): 139-148.

王琴, 王鑫, 颜靖柯, 钟美玲, 曾静. 融合空间位置注意力机制的英语题注生成模型[J]. 计算机工程与应用, 2022, 58(12): 139-148.

References

[1] CARBERRY S，ELZER S，DEMIR S.Information graphics：an untapped resource for digital libraries[C]//Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.Seattle：ACM，2006：581-588.
[2] DUBOUE P A，MCKEOWN K.Statistical acquisition of content selection rules for natural language generation[C]//Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing，2003：121-128.
[3] STENT A，PRASAD R，WALKER M.Trainable sentence planning for complex information presentations in spoken dialog systems[C]//Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics（ACL-04）.Barcelona：ACL，2004：79-86.
[4] WISEMAN S，SHIEBER S M，RUSH A M.Challenges in data-to-document generation[J].arXiv：1707.08052，2017.
[5] LI G，CREGO J M，SENELLART J.Enhanced transformer model for data-to-text generation[C]//Proceedings of the 3rd Workshop on Neural Generation and Translation.Hong Kong：Association for Computational Linguistics，2019：148-156.
[6] 郭望皓，范江威，张克亮.融合语言学知识的神经机器翻译研究进展[J].计算机科学与探索，2021，15（7）：1183-1194.
GUO W H，FAN J W，ZHANG K L.Advance research on neural machine translation integrating linguistic knowledge[J].Journal of Frontiers of Computer Science and Technology，2021，15（7）：1183-1194.
[7] AI X，FANG B.Empirical regularization for synthetic sentence pairs in unsupervised neural machine translation[C]//Proceedings of the AAAI Conference on Artificial Intelligence.Virtual Event：AAAI，2021：12471-12479.
[8] VIJAYAKUMAR A，COGSWELL M，SELVARAJU R，et al.Diverse beam search for improved description of complex scenes[C]//Proceedings of the AAAI Conference on Artificial Intelligence.New Orleans，Louisian：AAAI，2018：7371-7379.
[9] CUI Z，BADAM S K，YAL?IN M A，et al.Datasite：proactive visual data exploration with computation of insight-based recommendations[J].Information Visualization，2019，18（2）：251-267.
[10] WANG Y，SUN Z，ZHANG H，et al.DataShot：automatic generation of fact sheets from tabular data[J].IEEE Transactions on Visualization and Computer Graphics，2019，26（1）：895-905.
[11] SRINIVASAN A，DRUCKER S M，ENDERT A，et al.Augmenting visualizations with interactive data facts to facilitate interpretation and communication[J].IEEE Transactions on Visualization and Computer Graphics，2018，25（1）：672-681.
[12] CHEN C，ZHANG R，KOH E，et al.Figure captioning with reasoning and sequence-level training[J].arXiv：1906.02850，2019.
[13] PUDUPPULLY R，DONG L，LAPATA M.Data-to-text generation with content selection and planning[C]//Proceedings of the AAAI Conference on Artificial Intelligence.Honolulu，Hawaii：AAAI，2019：6908-6915.
[14] AOKI K，MIYAZAWA A，ISHIGAKI T，et al.Controlling contents in data-to-document generation with human-designed topic labels[J].Computer Speech & Language，2021，66：101154.
[15] 周龙，王晨，史崯.基于RNN的Webshell检测研究[J].计算机工程与应用，2020，56（14）：88-92.
ZHOU L，WANG C，SHI Y.Research on Webshell detection based on RNN[J].Computer Engineering and Applications，2020，56（14）：88-92.
[16] 丁宇阳，李明悦，谢柠宇，等.双LSTM的光场图像去雨算法研究[J].计算机工程与应用，2021，57（18）：227-237.
DING Y Y，LI M R，XIE N Y，et al.Research of dual LSTM methocl for rain streaks removal on light field images[J].Computer Engineering and Applications，2021，57（18）：227-237.
[17] VASWANI A，SHAZEER N，PARMAR N，et al.Attention is all you need[J].arXiv：1706.03762，2017.
[18] CHEN W，CHEN J，SU Y，et al.Logical natural language generation from open-domain tables[J].arXiv：2004.10404，
2020.
[19] PARIKH A P，WANG X，GEHRMANN S，et al.Totto：a controlled table-to-text generation dataset[J].arXiv：2004.14373，
2020.
[20] NOZZA D，MANCHANDA P，FERSINI E，et al.Learning to adapt with word embeddings：domain adaptation of named entity recognition systems[J].Information Processing & Management，2021，58（3）：102537.
[21] TAN Z，WANG M，XIE J，et al.Deep semantic role labeling with self-attention[C]//Proceedings of the AAAI Conference on Artificial Intelligence.New Orleans，Louisian：AAAI，2018：4929-4936.
[22] WANG S，CLARK R，WEN H，et al.End-to-end，sequence-to-sequence probabilistic visual odometry through deep neural networks[J].The International Journal of Robotics Research，2018，37（4/5）：513-542.
[23] LEBRET R，GRANGIER D，AULI M.Neural text generation from structured data with application to the biography domain[J].arXiv：1603.07771，2016.
[24] PUDUPPULLY R，DONG L，LAPATA M.Data-to-text generation with entity modeling[J].arXiv：1906.03221，2019.