计算机工程与应用 ›› 2021, Vol. 57 ›› Issue (16): 197-202.DOI: 10.3778/j.issn.1002-8331.2009-0295

• 图形图像处理 • 上一篇    下一篇

一种改进的室内场景语义分割网络

贺照蒙,孔广黔,吴云   

  1. 贵州大学 计算机科学与技术学院,贵阳 550025
  • 出版日期:2021-08-15 发布日期:2021-08-16

Improved Semantic Segmentation Network for Indoor Scenes

HE Zhaomeng, KONG Guangqian, WU Yun   

  1. School of Computer Science and Technology, Guizhou University, Guiyang 550025, China
  • Online:2021-08-15 Published:2021-08-16

摘要:

针对目前室内场景语义分割网络无法很好融合图像的RGB信息和深度信息的问题,提出一种改进的室内场景语义分割网络。为使网络能够有选择性地融合图像的深度特征和RGB特征,引入注意力机制的思想,设计了特征融合模块。该模块能够根据深度特征图和RGB特征图的特点,学习性地调整网络参数,更有效地对深度特征和RGB特征进行融合;同时使用多尺度联合训练,加速网络收敛,提高分割准确率。通过在SUNRGB-D和NYUDV2数据集上验证,相比于包含深度敏感全连接条件随机场的RGB-D全卷积神经网络(DFCN-DCRF)、深度感知卷积神经网络(Depth-aware CNN)、多路径精炼网络(RefineNet)等目前主流的语义分割网络,所提网络具有更高的分割精度,平均交并比(mIoU)分别达到46.6%和48.0%。

关键词: 室内场景语义分割, 深度学习, 注意力机制, 特征融合, 多尺度联合训练

Abstract:

Aiming at the problem that the current indoor scene semantic segmentation method cannot well integrate the RGB information and depth information of the image, an improved indoor scene semantic segmentation network is proposed. In order to enable the model to selectively fuse the depth features and RGB features of the image, and introduce the idea of attention mechanism, a feature fusion module is designed. According to the characteristics of depth feature map and RGB feature map, the module can adjust network parameters learning, and more effectively carry out deep fusion of depth features and RGB features. At the same time, multi-scale joint training is used to accelerate network convergence and improve segmentation accuracy. Through the verification on the SUNRGB-D and NYUDV2 datasets, compared to the current mainstream semantic segmentation networks such as RGB-D Fully Convolutional Neural Network(DFCN) with a Depth-sensitive fully-connected Conditional Random Field(DCRF), Depth-aware convolutional neural networks (Depth-aware CNN), Multi-path Refinement Network (RefineNet), etc., the proposed network has higher segmentation accuracy, Mean Intersection over Union (mIoU) reached 46.6% and 48.0%, respectively.

Key words: indoor scene semantic segmentation, deep learning, attention mechanism, feature fusion, multi-scale joint training