基于金字塔场景分析网络改进的语义分割算法

doi:10.3778/j.issn.1002-8331.2006-0366

摘要/Abstract

摘要：

图像语义分割是图像识别中的一个经典难题，是机器视觉研究的一个热点。但在实际应用中，会出现语义标签预测不准确、所分割对象与背景之间边缘信息损失问题，这已逐渐成为了图像理解的瓶颈。据此，提出了一种基于金字塔场景分析网络（PSPNet）的网络改进结构，在特征学习模块中将输入图在原残差网络（ResNet）的基础上通过在网络内部增加卷积、池化操作，进一步学习各个层次特征，将所学习到的多个低层次特征图与高层次特征图相加，得到新的具有更多空间位置信息的特征图；为得到丰富的上下文信息，利用PSPNet的金字塔池化结构，将特征图中全局上下文信息与不同尺度局部上下文信息相结合，进行卷积和上采样，得到最终预测图。仿真实验结果表明，所改进的方法在PASCAL VOC 2012测试集中平均交并比（Mean Intersection over Union，MIoU）达到78.5%，较基准算法提升了1.7%。

关键词: 语义分割, 深度学习, 金字塔场景分析网络（PSPNet）, 残差网络（ResNet）, 平均交并比

Abstract:

Image semantic segmentation is a classic problem in image recognition and a hot spot in machine vision research. However, in practical applications, there will be inaccurate semantic label prediction and edge information loss between the segmented object and the background, which has gradually become a bottleneck in image understanding. Accordingly, this paper proposes a network improvement structure based on the Pyramid Scene Parsing Network（PSPNet）. Firstly, in the feature learning module, the input image is added to the original Residual Network（ResNet） by adding convolution and pooling operations within the network to further learn the features of each level, and add the multiple low-level feature maps learned to the high-level feature map to obtain a new feature map with more spatial location information. To obtain rich context information, it uses PSPNet’s pyramid pool structure, combining global context information in the feature map with local context information at different scales, convolution and upsampling to obtain the final prediction map. The simulation experimental results show that the improved method in the paper has a Mean Intersection over Union（MIoU） of 78.5% in the PASCAL VOC 2012 test set, which is 1.7% higher than the benchmark algorithm.

Key words: semantic segmentation, deep learning, Pyramid Scene Parsing Network（PSPNet）, Residual Network（ResNet）, Mean Intersection over Union（MIoU）

王嘉，张楠，孟凡云，王金鹤. 基于金字塔场景分析网络改进的语义分割算法[J]. 计算机工程与应用, 2021, 57(19): 220-227.

WANG Jia, ZHANG Nan, MENG Fanyun, WANG Jinhe. Improved Semantic Segmentation Algorithm Based on Pyramid Scene Parsing Network[J]. Computer Engineering and Applications, 2021, 57(19): 220-227.

参考文献

[1] 陈鸿翔.基于卷积神经网络的图像语义分割[D].杭州：浙江大学，2016.
CHEN H X.Image semantic segmentation based on convolutional neural network[D].Hangzhou：Zhejiang University，2016.
[2] 张静，靳淇兆，王洪振，等.多尺度信息融合的遥感图像语义分割模型[J].计算机辅助设计与图形学学报，2019，31（9）：1509-1517.
ZHANG J，JIN Q Z，WANG H Z，et al.Remote sensing image semantic segmentation model based on multi-scale information fusion[J].Journal of Computer-Aided Design & Computer Graphics，2019，31（9）：1509-1517.
[3] OTSU N.A threshold selection method from gray-level histograms[J].IEEE Transactions on Systems，Man，and Cybernetics，1979，9（1）：62-66.
[4] SHI J，BELONGIE S，LEUNG T，et al.Image and video segmentation：The normalized cut framework[C]//Proceedings of International Conference on Image Processing，Oct 4-7，1998：943-947.
[5] LONG J，SHELHAMER E，DARRELL T.Fully convolutional networks for semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，Jun 7-12，2015：3431-3440.
[6] ZHAO H，SHI J，QI X，et al.Pyramid scene parsing network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，Jul 21-26，2017：2881-2890.
[7] CHEN L C，PAPANDREOU G，KOKKINOS I，et al.Deeplab：Semantic image segmentation with deep convolutional nets，atrous convolution，and fully connected CRFS[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，2017，40（4）：834-848.
[8] CHEN L C，PAPANDREOU G，SCHROFF F，et al.Rethinking atrous convolution for semantic image segmentation[J].arXiv：1706.05587，2017.
[9] BADRINARAYANAN V，KENDALL A，CIPOLLA R.Segnet：A deep convolutional encoder-decoder architecture for image segmentation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，2017，39（12）：2481-2495.
[10] 曲长波，姜思瑶，吴德阳.空洞卷积的多尺度语义分割网络[J].计算机工程与应用，2019，55（24）：91-95.
QU C B，JIANG S Y，WU D Y.Multi-scale semantic segmentation network based on atrous convolution[J].Computer Engineering and Applications，2019，55（24）：91-95.
[11] BYEON W，BREUEL T M，RAUE F，et al.Scene labeling with lstm recurrent neural networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，Jun 7-12，2015：3547-3555.
[12] FU J，LIU J，TIAN H，et al.Dual attention network for scene segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，Jun 16-20，2019：3146-3154.
[13] LIN T Y，MAIRE M，BELONGIE S，et al.Microsoft coco：Common objects in context[C]//Proceedings of the European Conference on Computer Vision，Sep 6-12，2014：740-755.
[14] HE K，ZHANG X，REN S，et al.Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，Jun 26-Jul 1，2016：770-778.
[15] EVERINGHAM M，VAN GOOL L，WILLIAMS C K I，et al.The pascal visual object classes（voc） challenge[J].International Journal of Computer Vision，2010，88（2）：303-338.
[16] HARIHARAN B，ARBELáEZ P，BOURDEV L，et al.Semantic contours from inverse detectors[C]//2011 International Conference on Computer Vision，Nov 6-13，2011：991-998.
[17] CHEN L C，PAPANDREOU G，KOKKINOS I，et al.Semantic image segmentation with deep convolutional nets and fully connected crfs[J].arXiv：1412.7062，2014.
[18] ZHENG S，JAYASUMANA S，ROMERA-PAREDES B，et al.Conditional random fields as recurrent neural networks[C]//Proceedings of the IEEE International Conference on Computer Vision，Dec 13-16，2015：1529-1537.
[19] LIU Z，LI X，LUO P，et al.Semantic image segmentation via deep parsing network[C]//Proceedings of the IEEE International Conference on Computer Vision，Dec 13-16，2015：1377-1385.