Computer Engineering and Applications ›› 2023, Vol. 59 ›› Issue (7): 222-231.DOI: 10.3778/j.issn.1002-8331.2111-0518

• Graphics and Image Processing • Previous Articles     Next Articles

Dual-Modal Feature Fusion Semantic Segmentation of RGB-D

LUO Penlin, FANG Yanhong, LI Xin, LI Xue   

  1. 1.School of Information Engineering, Southwest University of Science and Technology, Mianyang, Sichuan 621010, China
    2.Robot Technology Used for Special Environment Key Laboratory of Sichuan Province, Southwest University of Science and Technology, Mianyang, Sichuan 621010, China
  • Online:2023-04-01 Published:2023-04-01



  1. 1.西南科技大学 信息工程学院,四川 绵阳 621010
    2.西南科技大学 特殊环境机器人技术四川省重点实验室,四川 绵阳 621010

Abstract: The existing RGB image semantic segmentation network for complex indoor scenes is susceptible to factors such as color and lighting, while it is also challenging to integrate dual-modal features effectively. Regarding the issue indicated above, this paper proposes an attention mechanism bimodal fusion network(AMBFNet) that adopts an encoder-decoder structure. In the first phase, building the bimodal fusion network structure(AMBF) is carried out to reasonably allocate the location and channel information of the features at each stage of the encoding branch. And then, designing the DA-context module is implemented to merge the context information. Finally, the multi-scale feature maps are cross-layer fused through the decoder to reduce the problem of misrecognition between classes and the loss of small-scale targets in the prediction results. The test results on the two public datasets of SUN RGB-DNYU and Depth v2(NYUDV2) show the consequence that compared with the more advanced RGB-D semantic segmentation network such as the RedNet, ACNet and ESANet, under the same hardware conditions, the network proposed in this paper has better segmentation performance. At the same time, the MIoU reaches 47.9% and 50.0%, respectively.

Key words: attention mechanism, dual modal feature fusion, dual attention perception context, RGB-D semantic segmentation

摘要: 针对复杂室内场景中,现有RGB图像语义分割网络易受颜色、光照等因素影响以及RGB-D图像语义分割网络难以有效融合双模态特征等问题,提出一种基于注意力机制的RGB-D双模态特征融合语义分割网络AMBFNet(attention mechanism?bimodal?fusion?network)。该网络采用编-解码器结构,首先搭建双模态特征融合结构(AMBF)来合理分配编码支路各阶段特征的位置与通道信息,然后设计双注意感知的上下文(DA-context)模块以合并上下文信息,最后通过解码器将多尺度特征图进行跨层融合,以减少预测结果中类间误识别和小尺度目标丢失问题。在SUN?RGB-DNYU和NYU Depth v2(NYUDV2)两个公开数据集上的测试结果表明,相较于残差编解码(RedNet)、注意力互补网络(ACNet)、高效场景分析网络(ESANet)等目前较先进的RGB-D语义分割网络,在同等硬件条件下,该网络具有更好的分割性能,平均交并比(MIoU)分别达到了47.9%和50.0%。

关键词: 注意力机制, 双模态特征融合, 双重注意感知上下文, RGB-D语义分割