融合多层次特征的DeepLabv3+轻量级图像分割算法

doi:10.3778/j.issn.1002-8331.2306-0384

摘要/Abstract

摘要： 基于深度学习的图像语义分割模型通常参数量大，复杂度高，难以部署到移动平台。针对以上问题，对DeepLabv3+算法进行改进，提出一种改进的轻量级图像分割算法。模型的骨干网络使用轻量级MoblieNetv2网络，并获取四个不同层次的输入特征，得到四种不同的语义信息；提出CAFF（coordinate attention feature fusion）模块，融合中间两个层次特征并加入位置信息；改进空洞空间金字塔池化（atrous spatial pyramid pooling，ASPP）模块，提出CS_ASPP（channel strip_atrous spatial pyramid pooling）模块，在不同膨胀率的空洞卷积后引入CAM（channel attention module）机制，同时并联带状池化（strip pooling，SP）获取上下文信息，并在特征融合后引入SAM（spatial attention module）机制提升分割精度。在PASCAL VOC 2012数据集上进行实验，平均交并比（mIoU）达到了79.14%。实验结果表明，该模型更加精准，且在参数量、分割速度以及分割效果之间达到了较好的平衡。

关键词: 图像分割, DeepLabV3+, 多层次特征融合, 轻量级, 注意力机制

Abstract: Image semantic segmentation models that rely on deep learning usually possess a vast number of parameters, exhibit high complexity, and pose challenges when deploying on mobile platforms. To address the aforementioned issues, this paper enhances the DeepLabv3+ algorithm and introduces an improved lightweight image segmentation algorithm. Firstly, the backbone network of the model uses the lightweight MoblieNetv2 network, and obtains four different levels of input characteristics and four different semantic information. The CAFF (coordinate attention feature fusion) module is proposed, which integrates the features of the middle two levels and adds location information. The ASPP (atrous spatial pyramid pooling) module is improves, the CS_ASPP module is proposed. The CAM (channel attention module) mechanism is introduced after convolution with different expansion rates, while parallel strip pooling (SP) is used to obtain contextual information, and SAM (spatial attention module) mechanism is introduced after feature fusion to improve segmentation accuracy. In the experiment conducted on the PASCAL VOC 2012 dataset, the mIoU (mean intersection over union ratio) is measured to be 79.14%. Compared with common segmentation algorithms and improved segmentation algorithms, the model demonstrates the improved accuracy and achieves a desirable equilibrium among the number of parameters, segmentation speed, and segmentation performance.

Key words: image segmentation, DeepLabV3+, multilevel feature fusion, lightweight, attention mechanism

周华平, 邓彬. 融合多层次特征的DeepLabv3+轻量级图像分割算法[J]. 计算机工程与应用, 2024, 60(16): 269-275.

ZHOU Huaping, DENG Bin. DeepLabv3+ Lightweight Image Segmentation Algorithm Based on Multilevel Feature Fusion[J]. Computer Engineering and Applications, 2024, 60(16): 269-275.

参考文献

[1] ZHANG S, LI Y P. Retinal vascular image segmentation based on improved HED network[J]. Acta Optica Sinica, 2020, 40(6): 0610002.
[2] ASGSRI TAGHANAKI S, ABHISHEK K, COEN J P, et al. Deep semantic segmentation of natural and medical images: a review[J]. Artificial Intelligence Review, 2021, 54(1): 137-178.
[3] YUAN X, SHI J, GU L. A review of deep learning methods for semantic segmentation of remote sensing imagery[J]. Expert Systems with Applications, 2021, 169: 114417.
[4] 王奕清. 基于计算机视觉的卫星云图反演降水量方法研究[D]. 成都: 电子科技大学, 2021.
WANG Y Q. A computer vision method for precipitation inversion with satellite cloud images[D]. Chengdu: University of Electronic Science and Technology of China, 2021.
[5] IVANOVS M, OZOLS K, DOBRAJS A, et al. Improving semantic segmentation of urban scenes for self-driving cars with synthetic images[J]. Sensors, 2022, 22(6): 2252.
[6] KONTSCHIEDER P, BULO S R, BISCHOF H, et al. Structured class-labels in random forests for semantic image labelling[C]//Proceedings of the 2011 International Conference on Computer Vision, 2011: 2190-2197.
[7] VAN HEUVEL D M, MANDL R, HULSHOFF P H. Normalized cut group clustering of resting-state FMRI data[J]. PLoS One, 2008, 3(4): e2001.
[8] LONG J, SHELHAMER E, DARRELL T. Fully convolutional networks for semantic segmentation[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, 2015: 3431-3440.
[9] BADRINARAYANAN V, KENDALL A, CIPOLLA R. SegNet: a deep convolutional encoder-decoder architecture for image segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(12): 2481-2495.
[10] CHEN L C, PAOANDREOU, KOKKINOS I, et al. Semantic image segmentation with deep convolutional nets and fully connected CRFs[J]. arXiv:1412.7062, 2014.
[11] CHEN L C, PAPANDREOU G, KOKKINOS I, et al. DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 40(4): 834-848.
[12] CHEN L C, PAPANDREOU G, SCHROFF F, et al. Rethinking atrous convolution for semantic image segmentation[J]. arXiv:1706.05587, 2017.
[13] CHEN L C, ZHU Y, PAPANDREOU G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation[C]//Proceedings of the 15th European Conference on Computer Vision, 2018: 801-818.
[14] HOWAR A G，ZHU M，CHEN B，et al. MoblieNets: efficient convolutional neural networks for mobile vision applications[J]. arXiv:1704.04861, 2017.
[15] SANDLER M, HOWARD A, ZHU M, et al. MobileNetV2: inverted residuals and linear bottlenecks[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018: 4510-4520.
[16] HOU Q B, ZHANG L, CHENG M M, et al. Strip pooling: rethinking spatial pooling for scene parsing[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 4003-4012.
[17] WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[C]//Proceedings of the 15th European Conference on Computer Vision, Munich, 2018: 3-19.
[18] ZHANG Z L, SABUNCU M. Generalized cross entropy loss for training deep neural networks with noisy labels[C]//Proceedings of the 32nd International Conference on Neural Information Processing Systems, Montreal, 2018: 8792-8802.
[19] RONNEBERGER O, FISCHER P B R. U-Net: convolutional networks for biomedical image segmentation[C]//LNCS 9351: Proceedings of the 18th International Conference on Medical Image Computing and Computer Assisted Intervention. Cham: Springer, 2015: 234-241.
[20] ZHAO H S, SHI J P, QI X J, et al. Pyramid scene parsing network[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017: 2881-2890.
[21] SUN K, ZHAO Y, JIANG B R, et al. High-resolution representations for labeling pixels and regions[J]. arXiv:1904. 04514, 2019.