Computer Engineering and Applications ›› 2022, Vol. 58 ›› Issue (20): 117-123.DOI: 10.3778/j.issn.1002-8331.2102-0280

• Pattern Recognition and Artificial Intelligence • Previous Articles     Next Articles

Cross-Modal Target Instance Segmentation Method Based on DMN

XIONG Junyao, SONG Zhenfeng, WANG Rong   

  1. School of Information Technology and Network Security, People’s Public Security University of China, Beijing 100038, China    
  • Online:2022-10-15 Published:2022-10-15



  1. 中国人民公安大学 信息与网络安全学院,北京 100038

Abstract: A cross-modal target instance segmentation method based on DMN, which aims to segment the objects described by natural language expression from the image, is proposed in this paper. First of all, the CBAM attention mechanism is introduced in the visual feature extraction network DPN92, which pays attention to the useful information in space and channel. Secondly, the BN layer is replaced with the normalization of the union of BN and FRN, which reduces the influence batch volume and number of channels in the performance of the extraction characteristic network, and improves the generalization ability of the network. Finally, the proposed scheme is simulated based on three common datasets, ReferIt, GRef and UNC. Simulation results indicate that the mIou evaluation index, which the introduction of CBAM attention mechanism and the joint normalization model, is improved by 1.85 and 0.52 percentage points respectively on the formal two datasets, and is improved by 1.98, 2.22 and 2.75 percentage points on the three validation sets split by UNC, and the improved model is better than the existing model.

Key words: cross-modal, natural language processing, target instance segmentation, attention mechanisms, union normalization

摘要: 在DMN的基础上提出一种跨模态目标实例分割方法,旨在结合自然语言表达,利用不同模态信息从图像中分割所描述对象。在视觉特征提取网络DPN92中引入CBAM注意力机制,关注空间和通道上的有用信息;将BN层替换为联合BN和FRN的正则化,减少批次量和通道数对提取特征网络性能的影响,提高网络的泛化能力;在三个通用数据集ReferIt、GRef和UNC上进行仿真实验。实验结果显示,提出的引入CBAM注意力机制和联合正则化改进模型在mIou评价指标上,ReferIt和GRef上分别提升了1.85和0.52个百分点,在UNC三个验证集上分别提升了1.98、2.22和2.75个百分点。表明改进模型在预测准确度方面优于已有模型。

关键词: 跨模态, 自然语言处理, 目标实例分割, 注意力机制, 联合正则化