并行注意力机制在图像语义分割中的应用

doi:10.3778/j.issn.1002-8331.2011-0476

摘要/Abstract

摘要： 在卷积神经网络中融入注意力机制越来越成为语义分割强化特征学习的重要方法。提出了一种融合了局部注意力和全局注意力的卷积神经网络。输入图像经主干网络的特征提取，并行输入给局部注意力和全局注意力模块。局部注意力模块以编码-解码结构实现多尺寸的局部特征融合，全局注意力模块根据每个像素与其所在特征图上所有像素的相关性捕获全局信息。融合两个注意力模块不仅减少了局部信息的丢失，而且捕获了具有长距离依赖的全局信息，有效提升了特征提取的能力。采用一种数据相关的上采样方法代替双线性插值法恢复特征图至输入尺寸，同时改善了分割效果。采用Dice Loss损失函数并针对样本不平衡问题在类别损失前加入权重系数进一步改善了分割效果。该方法在药丸污点数据集、药丸缺损数据集以及走廊数据集上分别得到了96.39%、93.44%、96.28%的平均交并比结果。

关键词: 局部注意力, 全局注意力, 数据相关上采样, 样本不平衡

Abstract: The integration of attention mechanism in convolutional neural networks has increasingly become an important method for semantic segmentation to strengthen feature learning. This paper proposes a convolutional neural network that combines local attention and global attention. The input image is extracted by the backbone network and input to the local attention and global attention modules in parallel. The local attention module uses an encoding-decoding structure to achieve multi-scale local feature fusion. The global attention module captured global information based on the correlation between each pixel and all pixels on the feature map. Fusion of two attention modules not only reduce the loss of local information but also capture global information with long distance dependencies. This paper uses a data-dependent upsampling method to replace the bilinear interpolation method to upsample the feature map to the input size and improves the segmentation results. This paper uses Dice Loss loss function and adds weight coefficients before the category loss for the imbalanced of sample to further improve the segmentation results. The method obtains Mean IoU scores of 96.39%, 93.44%, 96.28% on the pill contamination dataset, pill crack dataset, and corridor dataset, respectively.

Key words: local attention, global attention, data-dependent upsampling, imbalanced of sample

张汉, 张德祥, 陈鹏, 章军, 王兵. 并行注意力机制在图像语义分割中的应用[J]. 计算机工程与应用, 2022, 58(9): 151-160.

ZHANG Han, ZHANG Dexiang, CHEN Peng, ZHANG Jun, WANG Bing. Application of Parallel Attention Mechanism in Image Semantic Segmentation[J]. Computer Engineering and Applications, 2022, 58(9): 151-160.

参考文献

[1] 梁新宇，罗晨，权冀川，等.基于深度学习的图像语义分割技术研究进展[J].计算机工程与应用，2020，56（2）：18-28.
LIANG X Y，LUO C，QUAN J C，et al.Research on progress of image semantic segmentation based on deep learning[J].Computer Engineering and Applications，2020，56（2）：18-28.
[2] 周敬.图像分割中阈值法的研究[J].机电技术，2010，33（1）：39-41.
ZHOU J.Research on threshold method in image segmentation[J].Mechanical & Electrical Technology，2010，33（1）：39-41.
[3] 段瑞玲，李庆祥，李玉和.图像边缘检测方法研究综述[J].光学技术，2005，31（3）：415-419.
DUAN R L，LI Q X，LIY H，Summary of image edge detection[J].Optical Technique，2005，31（3）：415-419.
[4] 全红艳，张田文.基于区域生长的网格模型分割技术[J].计算机辅助设计与图形学学报，2006，18（7）：119-124.
QUAN H Y，ZHANG T W.Region growth approach for mesh model segmentation[J].Journal of Computer-Aided Design and Computer Graphics，2006，18（7）：119-124.
[5] 焦蓬蓬，郭依正.一种基于数学形态学的车牌定位与分割方法[J].自动化技术与应用，2013，32（6）：57-59.
JIAO P P，GUO Y Z.An algorithm for license plate location and character segmentation based on mathematical morphology[J].Techniques of Automation and Applications，2013，32（6）：57-59.
[6] ZHAO X，WU Y，SONG G，et al.A deep learning model integrating FCNNs and CRFs for brain tumor segmentation[J].Medical Image Analysis，2017，43（6）：98-111.
[7] LIU Z，LI X，LUO P，et al.Semantic image segmentation via deep parsing network[C]//Proceedings of the IEEE International Conference on Computer Vision，2015：1377-1385.
[8] LONG J，SHELHAMER E，DARRELL D.Fully convolution networks for sematic segmentation[J].IEEE Transactions on Pattern Analysis & Machine Intelligence，2014，39（4）：640-651.
[9] LIN G，MILAN A，SHEN C，et al.Refinenet：multipath refinement networks for high resolution semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2017：1925-1934.
[10] ORSIC M，KRESO I，BEVANDIC P，et al.In defense of pre-trained imagenet architectures for real-time semantic segmentation of road-driving images[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Los Alamitos：IEEE Computer Society Press，2019：12607-12616.
[11] CHAO P，ZHANG X Y，YU G，et al.Large kernel matters improve semantic segmentation by global convolutional network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2017：1743-1751.
[12] XIA Z，PERAZZI F，GHARBI M，et al.Basis prediction networks for effective burst denoising with large kernels[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2020：11841-11850.
[13] CHEN L C，MURPHY K，KOKKINOUS I，et al.DeepLab：semantic image segmentation with deep convolutional nets，atrous convolution，and fully Connected CRFs[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，2017，40（4）：834-848.
[14] CHEN L C，PAPANDREOU G，SCHROFF F，et al.Rethinking atrous convolutionfor semantic image segmentation[J].arXiv：1706.05587，2017.
[15] CHEN L C，ZHU Y，PAPANDREOU G，et al.Encoder-decoder with atrous separable convolution for semantic image segmentation[C]//Proceeding of the European Conference on Computer Vision，2018：801-818.
[16] ZHAO H，SHI J，QI X，et al.Pyramid scene parsing network[C]//Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition.Washington：IEEE Computer Society，2017：6230-6239.
[17] 马书浩，安居白，于博.改进DeepLabv2的实时图像语义分割算法[J].计算机工程与应用，2020，56（18）：157-164.
MA S H，AN J B，YU B.Improved DeepLabv2 real-time image semantic segmentation algorithm[J].Computer Engineering and Applications，2020，56（18）：157-164.
[18] RONNEBERGER O，FISHER P，BROX T.U-net：convolutional networks for biomedical image segmentation[C]//Proceeding of International Conference on Medical Image Computing and Computer-Assisted Intevention，Munich，Oct 5-9，2015.Cham：Springer，2015：234-241.
[19] ZHOU Z，SIDDIQUEE M M R，TAJBAKHSH N，et al.UNet++：redesigning skip connections exploit multiscale features in image segmentation[J].IEEE Transactions on Medical Imaging，2020，39（6）：1856-1867.
[20] VIJAY B，ALEX K，ROBERTO C.SegNet：adeep convolutional encoder-decoder architecture for scene segmentation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，2017，39（12）：2481-2495.
[21] 钱宝鑫，肖志，宋威.改进的卷积神经网络在肺部图像上的分割应用[J].计算机科学与探索，2020，14（8）：1358-1367.
QIAN B X，XIAO Z，SONG W.Application of improved convolutional neural network in lung image segmentation[J].Journal of Frontiers of Computer Science and Technology，2020，14（8）：1358-1367.
[22] ZHAO H S，ZHANG Y，LIU S，et al.PSAnet：point-wise spatial attention network for scene parsing[C]//Proceedings of the European Conference on Computer Vision，2018：267-283.
[23] WANG X，GIRSHICK R，HE K，et al.Non-local neural networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2018：132-141.
[24] ZHU H G，MIAO Y，ZHANG X D.Semantic image segmentation with improve position attention and feature fusion[J].Neural Process Letters，2020，52：329-351.
[25] SHUAI B，ZUO Z，WANG B，et al.Scene segmentation with dag-recurrent neural networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，2018，40（6）：1480-1493.
[26] HU J，SHEN L，SUN G.Squeeze-and-excitation networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2018：7132-7141.
[27] FU J，LIU J，TIAN H，et al.Dual attention network for scene segmentation[C]//Proceedings of the IEEE Conference on Computer Visionand Pattern Recognition，2019：3141-3149.
[28] NIU R G.HMANet：hybrid multiple attention network for semantic segmentation in aerial images[J].arXiv：2001.02870v1，2020.
[29] TIAN Z，HE T，SHEN C，et al.Decoders matter for semantic segmentation：data-dependent decoding enables flexible feature aggregation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2019：3126-3135.
[30] MILLETARI F，NAVAB N，AHMADIA S A.V-Net：fully convolutional neural networks for volumetric medical image segmentation[C]//Proceedings of the Fourth International Conference on 3D Vision，Stanford，Oct 25-28，2016.Washington：IEEE Computer Society，2016：565-571.
[31] CHEN L C，COLLINA Y，ZHU G，et al.Searching for efficient multi-scale architectures for dense image prediction[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2018：123-134.
[32] HOU Q B，ZHANG L，CHENG M M，et al.Strip pooling：rethinking spatial pooling for scene parsing[C]//Procee- dings of the IEEE Conference on Computer Vision and Pattern Recognition，2020：4002-4011.
[33] VASWANI A，SHAZEER N，PARMAR N，et al.Attention is all your need[J].arXiv：1706.03762，2017.
[34] ZHANG H，SHI Q P，ZHANG Z Y.Context encoding for semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2018：7151-7160.
[35] YU H Y，WANG J D.Ocnet：object context network for scene parsing[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2018：6342-6351.
[36] HUANG Z，WANG X，HUANG L，et al.CCNet：criss-cross attention for semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2018：603-612.
[37] ZHANG H，GOODFELLOW I J，METAXAS D N，et al.Self-attention generative adversarial network[J].arXiv：1805.08318v2，2018.
[38] REZATOFIGHI H，TSOI N，GWAK J Y，et al.Generalized intersection over union：ametric and a loss for bounding box regression[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2019：658-666.
[39] LIN T Y，GOYAL P，GIRSHICK R，et al.Focal loss for denseobject detection[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，2020，42（2）：318-327.
[40] 傅博文，唐向宏，肖涛.Focal损失在图像情感分析上的应用研究[J].计算机工程与应用，2020，56（10）：179-184.
FU B W，TANG X H，XIAO T.Research on focal loss function applied to image emotion analysis[J].Computer Engineering and Applications，2020，56（10）：179-184.
[41] YANFG S，MATURANA D，SCHERER S.Real-time 3D scene layout from a single image using convolutional neural networks[C]//Proceedings of 2016 IEEE International Conference on Robotics and Automation，2016：2183-2189.

编辑推荐 0

Metrics

阅读次数

全文

124

HTML			PDF

最新录用	在线预览	正式出版	最新录用	在线预览	正式出版
0	0	0	0	0	124

	来源	本网站

	次数	124
	比例	100%

摘要

167

最新录用	在线预览	正式出版

0	0	168

	来源	本网站

	次数	167
	比例	100%