Computer Engineering and Applications ›› 2022, Vol. 58 ›› Issue (12): 139-148.DOI: 10.3778/j.issn.1002-8331.2111-0300

• Pattern Recognition and Artificial Intelligence • Previous Articles     Next Articles

English Caption Generation Model Fused with Attention Mechanism of Spatial Position

WANG Qin, WANG Xin, YAN Jingke, ZHONG Meiling, ZENG Jing   

  1. 1.Basic Teaching Department, Guilin University of Electronic Technology, Beihai, Guangxi 536000, China
    2.School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin, Guangxi 541004, China
    3.School of Marine Engineering, Guilin University of Electronic Technology, Beihai, Guangxi 536000, China
    4.College of Computer Engineering, Guilin University of Electronic Technology, Beihai, Guangxi 536000, China
    5.School of Information and Software Engineering, University of Electronic Science and Technology of China, Chengdu 610000, China
  • Online:2022-06-15 Published:2022-06-15



  1. 1.桂林电子科技大学 基础教学部,广西 北海 536000
    2.桂林电子科技大学 计算机与信息安全学院,广西 桂林 541004
    3.桂林电子科技大学 海洋工程学院,广西 北海 536000
    4.桂林电子科技大学 计算机工程学院,广西 北海 536000
    5.电子科技大学 信息与软件工程学院,成都 610000

Abstract: The transformer chart to text(TransChartText) model is proposed based on Transformer architecture in order to make the English caption generation model generate fluent, coherent and informative specific information annotations. By screening various scientific research papers and news article websites, a chart-based annotation description data set is made, which covers a wealth of data categories and logical reasoning. Data variables are introduced to replace the data values of the graph, which effectively improves the selection of the content of the captions generated by the model and promoted the model to generate coherent captions. In order to enhance the ability of learning the position relation between words and reduce the frequency of wrong word order, the spatial position embedding coding and cluster search algorithm are introduced into the encoder and decoder respectively. Experimental results show that TransChartText model achieves better scores on content selection(CS), content sequencing(CO), ROUGE and BLEU, and generates high-quality chart-based English captions.

Key words: language model, generative caption, Transformer, attention mechanism, beam search

摘要: 为使题注生成模型生成流畅、连贯和信息丰富的特定信息题注,在Transformer架构的基础上提出了Transformer Chart to Text(TransChartText)模型。通过筛选各种科研论文和新闻文章网站,制作了基于图表的题注描述数据集,该数据集的英语题注描述涵盖了丰富的数据类别和逻辑推理。引入数据变量替换图表数据值,有效提高了模型生成题注的内容选择,促使模型生成了连贯的题注内容。为进一步增强模型学习词与词之间位置关系的能力并降低错误词序频率,模型分别对编码器和解码器引入空间位置嵌入编码和集束搜索算法。实验结果表明,TransChartText模型在内容选择(CS)、内容排序(CO)、ROUGE、BLEU指标上取得了更好的分数,生成了高质量的基于图表的英语题注。

关键词: 语言模型, 生成式题注, Transformer, 注意力机制, 集束搜索