改进ASPP及多层次特征语义融合分割方法

doi:10.3778/j.issn.1002-8331.2203-0618

摘要/Abstract

摘要： 为解决图像语义分割中多尺度目标分割困难、类别边界预测不准确等问题，提出一种基于改进空洞空间金字塔池化的多层次特征语义融合分割方法。将深层次网络特征按通道分组，利用分组空洞空间金字塔池化模块捕获每个分组多尺度特征上下文信息；引入条状池化模块对上下文信息补充和完善，增强全局语义信息表达；根据语义引导融合模块建立不同层次特征像素间对应关系，将深层次语义信息以自底向上方式逐步融入到低层次高分辨率图像中。实验结果表明，该方法在PASCAL VOC 2012和Cityscapes公开数据集上分别获得73.1%、71.8%的平均交并比，且在相同精度下，该方法减少了39%的参数量。

关键词: 语义分割, 空洞空间金字塔池化, 特征融合, 上下文信息

Abstract: To solve the problems of the difficult multi-scale target segmentation and inaccurate category boundary prediction in image semantic segmentation, a multilevel feature semantic fusion segmentation method based on improved atrous spatial pyramid pooling is proposed. Firstly, the deep-level network features are grouped by the channels, and the multi-scalefeature context information of each grouped is captured by using the split atrous spatial pyramid pooling module. Secondly, the strip pooling module is introduced to supplement and refine the contextual information and enhance the global semantic information representation. Finally, the semantic guidance fusion module is used to establish the correspondence between the feature pixels at different levels, and the deep-level semantic information is gradually incorporated into the low-level high-resolution image with a bottom-up manner. The experimental results show that this method obtains 73.1% and 71.8% of the mean intersection over union on the PASCAL VOC 2012 and Cityscapes public datasets, respectively, and reduces the number of parameters by 39% with the same accuracy.

Key words: semantic segmentation, atrous spatial pyramid pooling, feature fusion, contextual information

王银宇, 孟凡云, 王金鹤, 刘志浩. 改进ASPP及多层次特征语义融合分割方法[J]. 计算机工程与应用, 2023, 59(13): 220-228.

WANG Yinyu, MENG Fanyun, WANG Jinhe, LIU Zhihao. Improved ASPP and Multilevel Feature Semantic Fusion Segmentation Method[J]. Computer Engineering and Applications, 2023, 59(13): 220-228.

参考文献

[1] HE B，LIU Y J，ZENG L B，et al.Product carbon footprint across sustainable supply chain[J].Journal of Cleaner Production，2019，241：118320.
[2] 文凯，唐伟伟，熊俊臣.基于注意力机制和有效分解卷积的实时分割算法[J].计算机应用，2022（9）：2659-2666.
WEN K，TANG W W，XIONG J C.Real-time segmentation algorithm based on attention mechanism and efficient decomposition convolution[J].Journal of Computer Application，2022（9）：2659-2666.
[3] HARALICK R M，SHANMUGAM K，DINSTEIN I H.Textural features for image classification[J].IEEE Transactions on Systems，Man，and Cybernetics，1973（6）：610-621.
[4] KRIZHEVSKY A，SUTSKEVER I，HINTON G.Imagenet classification with deep convolutional neural networks[J].Communications of the ACM，2017，60（6）：84-90.
[5] SZEGEDY C，LIU W，JIA Y，et al.Going deeper with convolutions[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2015：1-9.
[6] HE K M，ZHANG X Y，REN S Q，et al.Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，Jun 26-Jul 1，2016：770-778.
[7] SIMONYAN K，ZISSERMAN A.Very deep convolutional networks for large?scale image recognition[J].arXiv：1409. 1556，2014.
[8] LONG J，HELHAMER E，DARREL T.Fully convolutional networks for semantic segmentation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，2017，39（6）：640-651.
[9] RONNEBERGER O，FISHER P，BROX T.U-net：convolutional networks for biomedical image segmentation[C]//Proceeding of International Conference on Medical Image Computing and Computer-Assisted Intevention，Munich，Oct 5-9，2015：234-241.
[10] BADRINARAVANAN V，KENDALL A，CIPOLA R.SegNet：a deep convolutional encoder-decoder architecture for image segmentation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，2017，39（12）：2481-2495.
[11] LIN G S，MILAN A，SHEN C H，et al.Refinenet：multipath refinement networks for high resolution semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2017：1925-1934.
[12] LI X T，YOU A S，ZHU Z，et al.Semantic flow for fast and accurate scene parsing[C]//Proceedings of the European Conference on Computer Vision，2020：775-793.
[13] FAN M Y，LAI S Q，HUANG J S，et al.Rethinking bisenet for real-time semantic segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition，2021：9716-9725.
[14] LIU M Y，YIN H J.Efficient pyramid context encoding and feature embedding for semantic segmentation[J].Image and Vision Computing，2021，111：104195.
[15] LOU M，MENG J，QI Y L，et al.MCRNet：multi-level context refinement network for semantic segmentation in breast ultrasound imaging[J].Neurocomputing，2022，470：154-169.
[16] YIN P S，CAI H M，WU Q Y.DF-Net：deep fusion network for multi-source vessel segmentation[J].Information Fusion，2022，78：199-208.
[17] SUN Y H，YANG H J，ZHOU J L，et al.ISSMF：integrated semantic and spatial information of multi-level features for automatic segmentation in prenatal ultrasound images[J].Artificial Intelligence in Medicine，2022，125：102254.
[18] LIU W，RABINOVICH A，BERG A C.Parsenet：looking wider to see better[J].arXiv：1506.04579，2015.
[19] CHAO P，ZHANG X Y，YU G，et al.Large kernel matters—improve semantic segmentation by global convolutional network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2017：1743-1751.
[20] CHEN L C，PAPANDREOU G，KOKKINOS I，et al.Semantic image segmentation with deep convolutional nets and fully connected CRFS[J].arXiv：1412.7062，2014.
[21] CHEN L C，MURPHY K，KOKKINOS I，et al.DeepLab：semantic image segmentation with deep convolutional nets，atrous convolution，and fully Connected CRFs[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，2017，40（4）：834-848.
[22] CHEN L C，PAPANDREOU G，SCHROFF F，et al.Rethinking atrous convolution for semantic image segmentation[J].arXiv：1706.05587，2017.
[23] ZHAO H S，SHI J P，QI X J，et al.Pyramid scene parsing network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，Jul 21-26，2017：2881-2890.
[24] HE J J，DENG Z Y，QIAO Y.Dynamic multi-scale filters for semantic segmentation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision，2019：3562-3572.
[25] JIN Z C，LIU B，CHU Q，et al.ISNet：integrate image-level and semantic-level context for semantic segmentation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision，2021：7189-7198.
[26] LIU Y Y，DUAN Y P，ZENG T Y.Learning multi-level structural information for small organ segmentation[J].Signal Processing，2022，193：108418.
[27] LI X R，PI J D，LOU M，et al.Multi-level feature fusion network for nuclei segmentation in digital histopathological images[J].The Visual Computer，2023，39：1307-1322.
[28] WANG P Q，CHEN P F，YUAN Y，et al.Understanding convolution for semantic segmentation[C]//Proceedings of 2018 IEEE Winter Conference on Applications of Computer Vision（WACV），2018：1451-1460.
[29] HOU Q B，ZHANG L，CHENG M M，et al.Strip pooling：rethinking spatial pooling for scene parsing[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2020：4002-4011.
[30] ZHANG H，ZU K K，LU J，et al.EPSANet：an efficient pyramid squeeze attention block on convolutional neural network[J].arXiv：2105.14447，2021.
[31] ZHANG X Y，ZHOU X Y，LIN M X，et al.Shufflenet：an extremely efficient convolutional neural network for mobile devices[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2018：6848-6856.
[32] CORDTS M，OMRAN M，RAMOS S，et al.The cityscapes dataset for semantic urban scene understanding[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2016：3213.
[33] EVERINGHAM M，VAN G L，WILLIAMS C K I，et al.The pascal visual object classes（VOC） challenge[J].International Journal of Computer Vision，2010，88（2）：303-338.