Adversarial Semi-Supervised Semantic Segmentation with Attention Mechanism

doi:10.3778/j.issn.1002-8331.2112-0484

Abstract

Abstract: Image semantic segmentation is one of the most important research topics in computer vision. The current semantic segmentation algorithm based on full convolutional neural network has some problems, such as lack of correlation between pixels, convolution kernel receptive field smaller than the theoretical value, and high label cost of manually labeled data set. In order to solve the above problems, an antithesis semi-supervised semantic segmentation model integrating attention mechanism is proposed. The generative adversarial network is applied to image semantic segmentation to enhance the correlation between pixels. In this model, self-attention module and multi-core pooling module are added to generate network to fuse long distance semantic information, and the convolution kernel receptive field is enlarged. A large number of experiments are carried out on PASCAL VOC2012 enhanced dataset and Cityscapes dataset, and the experimental results prove the validity and reliability of the proposed method for image semantic segmentation.

Key words: semantic segmentation, generative adversarial network, attentional mechanism, semi-supervised training

摘要： 图像语义分割任务是计算机视觉领域重要研究课题之一。当前基于全卷积神经网络的语义分割算法存在像素之间缺乏关联性、卷积核感受野小于理论值、人工标记数据集标签成本大等问题。为了解决上述问题，提出了一种融合注意力机制的对抗式半监督语义分割模型。将生成对抗网络应用到图像语义分割中，增强像素点之间的关联性；提出模型在生成网络中加入自注意力模块和多核池化模块以对长距离语义信息进行融合，扩大了卷积核感受野；在PASCAL VOC2012增强数据集和Cityscapes数据集上进行了大量实验，实验结果证明了该方法在图像语义分割任务中的有效性和可靠性。

关键词: 语义分割, 生成对抗网络, 注意力机制, 半监督训练

YUN Fei, YIN Yanjun, ZHANG Wenxuan, ZHI Min. Adversarial Semi-Supervised Semantic Segmentation with Attention Mechanism[J]. Computer Engineering and Applications, 2023, 59(8): 254-262.

云飞, 殷雁君, 张文轩, 智敏. 融合注意力机制的对抗式半监督语义分割[J]. 计算机工程与应用, 2023, 59(8): 254-262.

References

[1] JIN Z，LIU B，CHU Q，et al.ISNet：integrate image-level and semantic-level context for semantic segmentation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision，2021：7189-7198.
[2] CAO H，WANG Y，CHEN J，et al.Swin-Unet：Unet-like pure transformer for medical image segmentation[J].arXiv：2105.
05537，2021.
[3] KOLTUN V.Efficient inference in fully connected CRFs with Gaussian edge potentials[C]//Advances in Neural Information Processing Systems，2011.
[4] PATHAK D，KRAHENBUHL P，DARRELL T.Constrained convolutional neural networks for weakly supervised segmentation[C]//Proceedings of the IEEE International Conference on Computer Vision，2015：1796-1804.
[5] WANG W，XIE E，LI X，et al.Shape robust text detection with progressive scale expansion network[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition，2019：9336-9345.
[6] LONG J，SHELHAMER E，DARRELL T.Fully convolutional networks for semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2015：3431-3440.
[7] KRIZHEVSKY A，SUTSKEVER I，HINTON G E.Imagenet classification with deep convolutional neural networks[C]//Advances in Neural Information Processing Systems，2012.
[8] SIMONYAN K，ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[J].arXiv：1409.
1556，2014.
[9] HE K，ZHANG X，REN S，et al.Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2016：770-778.
[10] CHEN L C，PAPANDREOU G，KOKKINOS I，et al.Semantic image segmentation with deep convolutional nets and fully connected CRFs[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，2017，40（4）：834-848.
[11] SZEGEDY C，LIU W，JIA Y，et al.Going deeper with convolutions[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2015：1-9.
[12] HOWARD A G，ZHU M，CHEN B，et al.Mobilenets：efficient convolutional neural networks for mobile vision applications[J].arXiv：1704.04861，2017.
[13] HU J，SHEN L，SUN G.Squeeze-and-excitation networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2018：7132-7141.
[14] WANG Q，WU B，ZHU P，et al.ECA-Net：efficient channel attention for deep convolutional neural networks[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition（CVPR），June 13-19，2020，Seattle，WA，USA.New York：IEEE Press，2020：11531-11539.
[15] WOO S，PARK J，LEE J Y，et al.Cbam：convolutional block attention module[C]//Proceedings of the European Conference on Computer Vision（ECCV），2018：3-19.
[16] LUC P，COUPRIE C，CHINTALA S，et al.Semantic segmentation using adversarial networks[J].arXiv：1611.08408，
2016.
[17] ZHAO S，CUI J，SHENG Y，et al.Large scale image completion via co-modulated generative adversarial networks[J].arXiv：2103.10428，2021.
[18] LAI W S，HUANG J B，AHUJA N，et al.Deep Laplacian pyramid networks for fast and accurate super-resolution[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2017：624-632.
[19] WANG X，SHRIVASTAVA A，GUPTA A.A-fast-rcnn：hard positive generation via adversary for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2017：2606-2615.
[20] SOULY N，SPAMPINATO C，SHAH M.Semi supervised semantic segmentation using generative adversarial network[C]//Proceedings of the IEEE International Conference on Computer Vision，2017：5688-5696.
[21] HUNG W C，TSAI Y H，LIOU Y T，et al.Adversarial learning for semi-supervised semantic segmentation[J].arXiv：1802.07934，2018.
[22] HOU Q，ZHOU D，FENG J.Coordinate attention for efficient mobile network design[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition，2021：13713-13722.
[23] DENG J，DONG W，SOCHER R，et al.Imagenet：a large-scale hierarchical image database[C]//2009 IEEE Conference on Computer Vision and Pattern Recognition，2009：248-255.
[24] HARIHARAN B，ARBELáEZ P，BOURDEV L，et al.Semantic contours from inverse detectors[C]//2011 International Conference on Computer Vision，2011：991-998.
[25] YU F，KOLTUN V.Multi-scale context aggregation by dilated convolutions[J].arXiv：1511.07122，2015.
[26] 朱锋，刘其朋.基于生成对抗网络的半监督图像语义分割[J].复杂系统与复杂性科学，2021，18（1）：23-29.
ZHU Feng，LIU Qipeng.Semi-supervised semantic segmentation based on generative adversarial networks[J].Complex Systems and Complexity Science，2021，18（1）：23-29.