[1] 梁新宇,罗晨,权冀川,等.基于深度学习的图像语义分割技术研究进展[J].计算机工程与应用,2020,56(2):18-28.
LIANG X Y,LUO C,QUAN J C,et al.Research on progress of image semantic segmentation based on deep learning[J].Computer Engineering and Applications,2020,56(2):18-28.
[2] 徐辉,祝玉华,甄彤,等.深度神经网络图像语义分割方法综述[J].计算机科学与探索,2021,15(1):47-59.
XU H,ZHU Y H,ZHEN T,et al.Survey of image semantic segmentation methods based on deep neural network[J].Journal of Frontiers of Computer Science and Technology,2021,15(1):47-59.
[3] LONG J,SHELHAMER E,DARRELL T.Fully convolutional networks for semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2015:3431-3440.
[4] ZHAO H,SHI J,QI X,et al.Pyramid scene parsing network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2017:2881-2890.
[5] CHEN L C,PAPANDREOU G,KOKKINOS I,et al.Semantic image segmentation with deep convolutional nets and fully connected CRFs[J].arXiv:1412.7062,2014.
[6] CHEN L C,PAPANDREOU G,KOKKINOS I,et al.Deeplab:semantic image segmentation with deep convolutional nets,atrous convolution,and fully connected CRFs[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,40(4):834-848.
[7] CHEN L C,PAPANDREOU G,SCHROFF F,et al.Rethinking atrous convolution for semantic image segmentation[J].arXiv:1706.05587,2017.
[8] CHEN L C,ZHU Y,PAPANDREOU G,et al.Encoder-decoder with atrous separable convolution for semantic image segmentation[C]//Proceedings of the European Conference on Computer Vision,2018:801-818.
[9] ZHAO H,QI X,SHEN X,et al.Icnet for real-time semantic segmentation on high-resolution images[C]//Proceedings of the European Conference on Computer Vision,2018:405-420.
[10] SANDLER M,HOWARD A,ZHU M,et al.Mobilenetv2:inverted residuals and linear bottlenecks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2018:4510-4520.
[11] WU T,TANG S,ZHANG R,et al.Cgnet:a light-weight context guided network for semantic segmentation[J].IEEE Transactions on Image Processing,2020,30:1169-1179.
[12] YU C,GAO C,WANG J,et al.Bisenet v2:bilateral network with guided aggregation for real-time semantic segmentation[J].International Journal of Computer Vision,2021,129(11):3051-3068.
[13] 李翔,张涛,张哲,等.Transformer在计算机视觉领域的研究综述[J].计算机工程与应用,2023,59(1):1-14.
LI X,ZHANG T,ZHANG Z,et al.Survey of Transformer research in computer vision[J].Computer Engineering and Applications,2023,59(1):1-14.
[14] DOSOVITSKIY A,BEYER L,KOLESNIKOV A,et al.An image is worth 16x16 words:transformers for image recognition at scale[C]//International Conference on Learning Representations,2021.
[15] LI Y,YUAN G,WEN Y,et al.EfficientFormer:vision transformers at mobilenet speed[J].arXiv:2206.01191,2022.
[16] CHEN Y,DAI X,CHEN D,et al.Mobile-former:bridging mobilenet and transformer[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2022:5270-5279.
[17] LIU Z,LIN Y,CAO Y,et al.Swin transformer:hierarchical vision transformer using shifted windows[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision,2021:10012-10022.
[18] WANG Q,DONG X,WANG R,et al.Swin transformer based pyramid pooling network for food segmentation[C]//2022 IEEE 2nd International Conference on Software Engineering and Artificial Intelligence,2022:64-68.
[19] SHI W,XU J,GAO P.SSformer:a lightweight transformer for semantic segmentation[J].arXiv:2208.02034,2022.
[20] JIANG X,LI Y,JIANG T,et al.RoadFormer:pyramidal deformable vision transformers for road network extraction with remote sensing images[J].International Journal of Applied Earth Observation and Geoinformation,2022,113:102987.
[21] LU L,XIAO Y,CHANG X,et al.Deformable attention-oriented feature pyramid network for semantic segmentation[J].Knowledge-Based Systems,2022,254:109623.
[22] YU L,LI Z,ZHANG J,et al.Self-attention on multi-shifted windows for scene segmentation[J].arXiv:2207.04403,2022.
[23] XIAO T,LIU Y,ZHOU B,et al.Unified perceptual parsing for scene understanding[C]//Proceedings of the European Conference on Computer Vision,2018:418-434.
[24] 刘腊梅,王晓娜,刘万军,等.融合转置卷积与深度残差图像语义分割方法[J].计算机科学与探索,2022,16(9):2132-2142.
LIU L M,WANG X N,LIU W J,et al.Image semantic segmentation method with fusion of transposed convolution and deep residual[J].Journal of Frontiers of Computer Science and Technology,2022,16(9):2132-2142.
[25] DONG B,WANG W,FAN D P,et al.Polyp-pvt:polyp segmentation with pyramid vision transformers[J].arXiv:2108.06932,2021.
[26] XIE E,WANG W,YU Z,et al.SegFormer:simple and efficient design for semantic segmentation with transformers[J].Advances in Neural Information Processing Systems,2021,34:12077-12090.
[27] LI X,SUN X,MENG Y,et al.Dice loss for data-imbalanced NLP tasks[J].arXiv:1911.02855,2019.
[28] DE BOER P T,KROESE D P,MANNOR S,et al.A tutorial on the cross-entropy method[J].Annals of Operations Research,2005,134(1):19-67.
[29] ZHOU B,ZHAO H,PUIG X,et al.Semantic understanding of scenes through the ade20k dataset[J].International Journal of Computer Vision,2019,127(3):302-321.
[30] CORDTS M,OMRAN M,RAMOS S,et al.The cityscapes dataset for semantic urban scene understanding[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2016:3213-3223.
[31] HE K M,ZHANG X Y,REN S Q,et al.Deep residual learning for image recognition[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition,Las Vegas,June 26-July 1,2016.New York:IEEE Press,2016:770-778.
[32] CAO Y,XU J,LIN S,et al.Gcnet:non-local networks meet squeeze-excitation networks and beyond[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops,2019.
[33] TOUVRON H,CORD M,DOUZE M,et al.Training data-efficient image transformers & distillation through attention[C]//International Conference on Machine Learning,2021:10347-10357.
[34] ZHENG S,LU J,ZHAO H,et al.Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2021:6881-6890.
[35] CHU X,TIAN Z,WANG Y,et al.Twins:revisiting the design of spatial attention in vision transformers[C]//Advances in Neural Information Processing Systems,2021:9355-9366.
[36] HUANG L,YUAN Y,GUO J,et al.Interlaced sparse self-attention for semantic segmentation[J].arXiv:1907.12273,2019.
[37] YUAN Y H,CHEN X K,CHEN X L,et al.Segmentation transformer:object-contextual representations for semantic segmentation[J].arXiv:1909.11065,2019.
[38] SUN K,ZHAO Y,JIANG B,et al.High-resolution representations for labeling pixels and regions[J].arXiv:1904.04514,2019.