Computer Engineering and Applications ›› 2021, Vol. 57 ›› Issue (19): 220-227.DOI: 10.3778/j.issn.1002-8331.2006-0366

Previous Articles     Next Articles

Improved Semantic Segmentation Algorithm Based on Pyramid Scene Parsing Network

WANG Jia, ZHANG Nan, MENG Fanyun, WANG Jinhe   

  1. School of Information and Control Engineering, Qingdao University of Technology, Qingdao, Shandong 266000, China
  • Online:2021-10-01 Published:2021-09-29



  1. 青岛理工大学 信息与控制工程学院,山东 青岛 266000


Image semantic segmentation is a classic problem in image recognition and a hot spot in machine vision research. However, in practical applications, there will be inaccurate semantic label prediction and edge information loss between the segmented object and the background, which has gradually become a bottleneck in image understanding. Accordingly, this paper proposes a network improvement structure based on the Pyramid Scene Parsing Network(PSPNet). Firstly, in the feature learning module, the input image is added to the original Residual Network(ResNet) by adding convolution and pooling operations within the network to further learn the features of each level, and add the multiple low-level feature maps learned to the high-level feature map to obtain a new feature map with more spatial location information. To obtain rich context information, it uses PSPNet’s pyramid pool structure, combining global context information in the feature map with local context information at different scales, convolution and upsampling to obtain the final prediction map. The simulation experimental results show that the improved method in the paper has a Mean Intersection over Union(MIoU) of 78.5% in the PASCAL VOC 2012 test set, which is 1.7% higher than the benchmark algorithm.

Key words: semantic segmentation, deep learning, Pyramid Scene Parsing Network(PSPNet), Residual Network(ResNet), Mean Intersection over Union(MIoU)


图像语义分割是图像识别中的一个经典难题,是机器视觉研究的一个热点。但在实际应用中,会出现语义标签预测不准确、所分割对象与背景之间边缘信息损失问题,这已逐渐成为了图像理解的瓶颈。据此,提出了一种基于金字塔场景分析网络(PSPNet)的网络改进结构,在特征学习模块中将输入图在原残差网络(ResNet)的基础上通过在网络内部增加卷积、池化操作,进一步学习各个层次特征,将所学习到的多个低层次特征图与高层次特征图相加,得到新的具有更多空间位置信息的特征图;为得到丰富的上下文信息,利用PSPNet的金字塔池化结构,将特征图中全局上下文信息与不同尺度局部上下文信息相结合,进行卷积和上采样,得到最终预测图。仿真实验结果表明,所改进的方法在PASCAL VOC 2012测试集中平均交并比(Mean Intersection over Union,MIoU)达到78.5%,较基准算法提升了1.7%。

关键词: 语义分割, 深度学习, 金字塔场景分析网络(PSPNet), 残差网络(ResNet), 平均交并比