Multiscale Attention-Guided Panoptic Segmentation Network

doi:10.3778/j.issn.1002-8331.2206-0247

Abstract

Abstract: Panoptic segmentation is a newly proposed image segmentation task in recent years. Most existing panoptic segmentation models use different ways to represent foreground instance objects and background undefined regions, so additional post-processing and fusion operations are required to deal with various instance overlapping and semantic conflicts. Fully convolutional panoptic segmentation network achieves unified feature representation and saves these complex operations, but its segmentation accuracy for foreground instance objects is not high, and the segmentation effect for long-distance small objects in images is not ideal. To solve these problems, based on the improvement and optimization of fully convolutional panoptic segmentation network, a multiscale attention-guided panoptic segmentation network is proposed. Firstly, the feature extraction network is improved, and the multiscale features acquisition capability of the model is enhanced by adding a bottom-up auxiliary path in backbone. Secondly, an attention module is proposed, which guides the update of the convolution kernel and generates more matching weights by fusing atrous spatial pyramid pooling with channel attention. Through the comparison experiment with fully convolutional panoptic segmentation network on the Cityscapes dataset, the image instance-level panoptic segmentation quality is improved by 2.74 percentage points, and the quality of the background unshaped regions and the comprehensive panoptic segmentation is improved by 1.36 percentage points and 1.94 percentage points, respectively. Class detection accuracy for small objects such as traffic lights and motorcycles is improved by 4.4 percentage points and 8.3 percentage points, respectively. The proposed panoptic segmentation network combines the advantages of fully convolutional panoptic segmentation network, multiscale features and attention mechanism, resulting in higher image instance-level panoptic segmentation accuracy and performance.

Key words: image segmentation, panoptic segmentation, fully convolutional panoptic segmentation network, , multiscale features, attention modules, atrous spatial pyramid pooling

摘要： 全景分割是近年来新提出的图像分割任务。现有全景分割模型大都对前景实例对象和背景未定形区采用不同的方式进行特征表示，因此需要额外的后处理和融合操作来处理各种实例重叠和语义冲突问题。全卷积全景分割网络实现了统一的特征表示，省去了这些复杂操作，但其对于前景实例对象的分割准确率不高，对图像中远距离小目标的分割效果不是很理想。针对这些问题，基于全卷积全景分割网络进行改进优化，提出一种多尺度注意力引导的全景分割网络。首先改进特征提取网络，通过在主干网中添加一条自底向上的辅助路径来增强模型的多尺度特征获取能力。其次提出一种注意力模块，通过将空洞空间金字塔池化与通道注意力融合，来引导卷积核更新，生成更匹配的权重。在Cityscapes数据集上与全卷积全景分割网络进行对比实验，图像实例级全景分割质量提高了2.74个百分点，背景未定形区全景分割质量和综合全景分割质量分别提高了1.36个百分点和1.94个百分点，对于交通灯和摩托车等小物体的类别检测准确率分别提高了4.4个百分点和8.3个百分点。提出的全景分割网络综合了全卷积全景分割网络、多尺度特征及注意力机制的优点，使得图像实例级全景分割准确率更高。

关键词: 图像分割, 全景分割, 全卷积全景分割网络, 多尺度特征, 注意力模块, 空洞空间金字塔池化

FU Du, QU Shaojun, FU Ya. Multiscale Attention-Guided Panoptic Segmentation Network[J]. Computer Engineering and Applications, 2023, 59(22): 223-232.

付都, 瞿绍军, 付亚. 多尺度注意力引导的全景分割网络[J]. 计算机工程与应用, 2023, 59(22): 223-232.

References

[1] KIRILLOV A，HE K，GIRSHICK R，et al.Panoptic segmentation[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition，Long Beach，Jun 15-20，2019.Piscataway：IEEE，2019：9396-9405.
[2] 欧阳柳，贺禧，瞿绍军.全卷积注意力机制神经网络的图像语义分割[J].计算机科学与探索，2022，16（5）：1136-1145.
OUYANG L，HE X，QU S J.Fully convolution neural network with attention module for semantic segmentation[J].Journal of Frontiers of Computer Science and Technology，2022，16（5）：1136-1145.
[3] HAFIZ A M，BHAT G M.A survey on instance segmentation：state of the art[J].International Journal of Multimedia Information Retrieval，2020（9）：171-189.
[4] XIONG Y，LIAO R，ZHAO H，et al.UPSNet：a unified panoptic segmentation network[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition，Long Beach，Jun 15-20，2019.Piscataway：IEEE，2019：8810-8818.
[5] KIRILLOV A，GIRSHICK R，HE K，et al.Panoptic feature pyramid networks[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition，Long Beach，Jun 15-20，2019.Piscataway：IEEE，2019：6392-6401.
[6] LI Y，ZHAO H，QI X，et al.Fully convolutional networks for panoptic segmentation[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition，Nashville，Jun 20-25，2021.Piscataway：IEEE，2021：214-223.
[7] HE K，ZHANG X，REN S，et al.Deep residual learning for image recognition[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition，Las Vegas，Jun 27-30，2016.Piscataway：IEEE，2016：770-778.
[8] LIN T Y，DOLLAR P，GIRSHICK R，et al.Feature pyramid networks for object detection[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition，Honolulu，Jul 21-26，2017.Piscataway：IEEE，2017：936-944.
[9] DAI J，QI H，XIONG Y，et al.Deformable convolutional networks[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision，Venice，Oct 22-29，2017.Piscataway：IEEE，2017：764-773.
[10] HE K，GKIOXARI G，DOLLAR P，et al.Mask R-CNN[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision，Venice，Oct 22-29，2017.Piscataway：IEEE，2017：2980-2988.
[11] LI Y，CHEN X，ZHU Z，et al.Attention-guided unified network for panoptic segmentation[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition，Long Beach，Jun 15-20，2019.Piscataway：IEEE，2019：7019-7028.
[12] HONG W，GAO Q，ZHANG W，et al.LPSNet：a lightweight solution for fast panoptic segmentation[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition，Nashville，Jun 20-25，2021.Piscataway：IEEE，2021：16741-16749.
[13] CHEN Q，CHENG A，HE X，et al.SpatialFlow：bridging all tasks for panoptic segmentation[J].IEEE Transactions on Circuits and Systems for Video Technology，2021，31（6）：2288-2300.
[14] CHEN L C，ZHU Y，PAPANDREOU G，et al.Encoder-decoder with atrous separable convolution for semantic image segmentation[C]//Proceedings of the 15th European Conference on Computer Vision，Munich，Sep 8-14，2018.Cham：Springer，2018：833-851.
[15] CHENG B，COLLINS M D，ZHU Y，et al.Panoptic-DeepLab：a simple，strong，and fast baseline for bottom-up panoptic segmentation[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition，Seattle，Jun 13-19，2020.Piscataway：IEEE，2020：12472-12482.
[16] 冯兴杰，张天泽.基于分组卷积进行特征融合的全景分割算法[J].计算机应用，2021，41（7）：2054-2061.
FENG X J，ZHANG T Z.Panoptic segmentation algorithm based on grouped convolution for feature fusion[J].Journal of Computer Applications，2021，41（7）：2054-2061.
[17] WANG H，ZHU Y，GREEN B，et al.Axial-DeepLab：stand-alone axial-attention for panoptic segmentation[C]//Proceedings of the 16th European Conference on Computer Vision，Glasgow，Aug 23-28，2020.Cham：Springer，2020：108-126.
[18] MOHAN R，VALADA A.EfficientPS：efficient panoptic segmentation[J].International Journal of Computer Vision，2021，129（12）：1551-1579.
[19] TAN M，LE Q V.EfficientNet：rethinking model scaling for convolutional neural networks[C]//Proceedings of the 36th International Conference on Machine Learning，Long Beach，Jun 9-15，2019：6105-6114.
[20] HU J，SHEN L，ALBANIE S，et al.Squeeze-and-excitation networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，2020，42（8）：2011-2023.
[21] LIU R，LEHMAN J，MOLINO P，et al.An intriguing failing of convolutional neural networks and the CoordConv solution[C]//Proceedings of the 32nd International Conference on Neural Information Processing Systems，Montréal，Dec 3-8，2018.New York：Curran Associates Inc，2018：9628-9639.
[22] LIU S，QI L，QIN H，et al.Path aggregation network for instance segmentation[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition，Salt Lake City，Jun 18-23，2018.Piscataway：IEEE，2018：8759-8768.
[23] GHIASI G，LIN T Y，LE Q V.NAS-FPN：learning scalable feature pyramid architecture for object detection[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition，Long Beach，Jun 15-20，2019.Piscataway：IEEE，2019：7029-7038.
[24] TAN M，PANG R，LE Q V.EfficientDet：scalable and efficient object detection[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition，Seattle，Jun 13-19，2020.Piscataway：IEEE，2020：10778-10787.
[25] 鲁博，瞿绍军.融合BiFPN和改进Yolov3-tiny网络的航拍图像车辆检测方法[J].小型微型计算机系统，2021，42（8）：1694-1698.
LU B，QU S J.Vehicle detection method in aerial images based on BiFPN and improved Yolov3-tiny network[J].Journal of Chinese Computer Systems，2021，42（8）：1694-1698.
[26] 冯兴杰，张志伟，史金钏.基于卷积神经网络和注意力模型的文本情感分析[J].计算机应用研究，2018，35（5）：1434-1436.
FENG X J，ZHANG Z W，SHI J C.Text sentiment analysis based on convolutional neural networks and attention model[J].Application Research of Computers，2018，35（5）：1434-1436.
[27] 张宸嘉，朱磊，俞璐.卷积神经网络中的注意力机制综述[J].计算机工程与应用，2021，57（20）：64-72.
ZHANG C J，ZHU L，YU L.Review of attention mechanism in convolutional neural networks[J].Computer Engineering and Applications，2021，57（20）：64-72.
[28] CORDTS M，OMRAN M，RAMOS S，et al.The cityscapes dataset for semantic urban scene understanding[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition，Las Vegas，Jun 27-30，2016.Piscataway：IEEE，2016：3213-3223.
[29] LI J，LIANG X，WEI Y，et al.Perceptual generative adversarial networks for small object detection[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition，Honolulu，Jul 21-26，2017.Piscataway：IEEE，2017：1951-1959.
[30] GUO D，ZHU L，LU Y，et al.Small object sensitive segmentation of urban street scene with spatial adjacency between object classes[J].IEEE Transactions on Image Processing，2019，28（6）：2643-2653.
[31] RAHMAN M A，YANG W.Optimizing intersection-over-union in deep neural networks for image segmentation[C]//Proceedings of the 12th International Symposium on Visual Computing，Las Vegas，Dec 12-14，2016.Cham：Springer，2016：234-244.
[32] LIN T Y，GOYAL P，GIRSHICK R，et al.Focal loss for dense object detection[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision，Venice，Oct 22-29，2017.Piscataway：IEEE，2017：2999-3007.