深度学习实时语义分割算法研究综述

doi:10.3778/j.issn.1002-8331.2210-0144

摘要/Abstract

摘要： 语义分割是从像素的角度分割出图片中的不同对象，并对原始图片中的每个像素进行标注的一种技术。但由于无人机导航、遥感图像、医疗诊断等应用领域需要实时地进行语义分割处理。所以，基于深度学习的实时语义分割技术得到了迅速的发展。实时语义分割技术发展至今已有许多的技术与模型。基于此，在对相关文献进行研究的基础上，由语义分割技术引出了实时语义分割技术，并简单叙述了实时语义分割的优点。随后，研讨出目前实时语义分割存在的重难点。根据重难点进而对已存在的相关技术与模型进行阐述，并总结技术与模型的优缺点。最后，展望实时语义分割所面临的挑战，并对实时语义分割进行了总结与归纳，为后续的研讨提供了一些理论参考。

关键词: 实时语义分割, 深度学习, 计算机视觉, 实时预测

Abstract: Semantic segmentation is a technique to segment different objects in a picture from the perspective of pixels and label each pixel in the original picture. However, due to UAV navigation, remote sensing images, medical diagnosis and other application fields, real-time semantic segmentation is needed. Therefore, the real-time semantic segmentation technology based on deep learning has developed rapidly. There are many technologies and models for real-time semantic segmentation. Based on this, on the basis of studying the related literature, the real-time semantic segmentation technology is introduced by semantic segmentation technology, and the advantages of real-time semantic segmentation are briefly described. Then, the important and difficult points of real-time semantic segmentation are discussed. According to the important and difficult points, the existing related technologies and models are expounded, and the advantages and disadvantages of the technologies and models are summarized. Finally, the challenges faced by real-time semantic segmentation are prospected, and the real-time semantic segmentation is summarized, which provides some theoretical references for the follow-up discussion.

Key words: real-time semantic segmentation, deep learning, computer vision, real-time prediction

何家峰, 陈宏伟, 骆德汉. 深度学习实时语义分割算法研究综述[J]. 计算机工程与应用, 2023, 59(8): 13-27.

HE Jiafeng, CHEN Hongwei, LUO Dehan. Review of Real-Time Semantic Segmentation Algorithms for Deep Learning[J]. Computer Engineering and Applications, 2023, 59(8): 13-27.

参考文献

[1] 张鑫，姚庆安，赵健，等.全卷积神经网络图像语义分割方法综述[J].计算机工程与应用，2022，58（8）：45-57.
ZHANG X，YAO Q A，ZHAO J，et al.Image semantic segmentation based on fully convolutional neural network[J].Computer Engineering and Applications，2022，58（8）：45-57.
[2] YUAN X，SHI J，GU L.A review of deep learning methods for semantic segmentation of remote sensing imagery[J].Expert Systems with Applications，2021，169：114417.
[3] TAKOS G.A survey on deep learning methods for semantic image segmentation in real-time[J].arXiv：2009.12942，2020.
[4] ZHANG M，ZHOU Y，ZHAO J，et al.A survey of semi- and weakly supervised semantic segmentation of images[J].Artificial Intelligence Review，2020，53（6）：4259-4288.
[5] 苏丽，孙雨鑫，苑守正.基于深度学习的实例分割研究综述[J].智能系统学报，2022，17（1）：16-31.
SU L，SUN Y X，YUAN S Z.A survey of instance segmentation research based on deep learning[J].CAAI Transactions on Intelligent Systems，2022，17（1）：16-31.
[6] 李晓筱，胡晓光，王梓强，等.基于深度学习的实例分割研究进展[J].计算机工程与应用，2021，57（9）：60-67.
LI X X，HU X G，WANG Z Q，et al.Survey of instance segmentation based on deep learning[J].Computer Engineering and Applications，2021，57（9）：60-67.
[7] 王可，沈川贵，罗孟华.基于深度学习的图像语义分割方法综述[J].信息技术与信息化，2022（4）：23-30.
WANG K，SHEN C G，LUO M H.Survey of image semantic segmentation methods based on deep learning[J].Information Technology and Informatization，2022（4）：23-30.
[8] GARCIA-GARCIA A，ORTS-ESCOLANO S，OPREA S，et al.A survey on deep learning techniques for image and video semantic segmentation[J].Applied Soft Computing，2018，70：41-65.
[9] PAPADEAS I，TSOCHATZIDIS L，AMANATIADIS A，et al.Real-time semantic image segmentation with deep learning for autonomous driving：a survey[J].Applied Sciences，2021，11（19）：8802.
[10] PASZKE A，CHAURASIA A，KIM S，et al.Enet：a deep neural network architecture for real-time semantic segmentation[J].arXiv：1606.02147，2016.
[11] MARDIA K V，HAINSWORTH T J.A spatial thresholding method for image segmentation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，1988，10（6）：919-927.
[12] ADAMS R，BISCHOF L.Seeded region growing[J].Retranslations on Pattern Analysis and Machine Intelligence，1994，16：641-647.
[13] GIANNAKEAS N，KARVELIS P S，EXARCHOS T P，et al.Segmentation of microarray images using pixel classi-
fication comparison with clustering-based methods[J].Computers in Biology and Medicine，2013，43（6）：705-716.
[14] KRIZHEVSKY A，SUTSKEVER I，HINTON G E.Image-
Net classification with deep convolutional neural networks[J].Communacation ACM，2017，60（6）：84-90.
[15] SIMONYAN K，ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[C]//Proceedings of International Conference on Learning Representations，2015.
[16] HE K，ZHANG X，REN S，et al.Deep residual learning for image recognition[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition（CVPR），2016.
[17] RONNEBERGER O，FISCHER P，BROX T.U-net：convolutional networks for biomedical image segmentation[C]//International Conference on Medical Image Computing and Computer Assisted Intervention.Cham：Springer，2015：234-241.
[18] SHELHAMER E，LONG J，DARRELL T.Fully convolutional networks for semantic segmentation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，2017，39（4）：640-651.
[19] CHEN L C，PAPANDREOU G，KOKKINOS I，et al.Semantic image segmentation with deep convolutional nets and fully connected CRFs[J].arXiv：1412.7062，2014.
[20] CHEN L，PAPANDREOU G，KOKKINOS I，et al.Deep lab：semantic image segmentation with deep convolutional nets，atrous convolution，and fully connected CRFs[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，2018，40（4）：834-848.
[21] CHEN L C，PAPANDREOU G，SCHROFF F，et al.Rethinking atrous convolution for semantic image segmentation[J].arXiv：1706.05587，2017.
[22] CHEN L C，ZHU Y，PAPANDREOU G，et al.Encoder-
decoder with atrous separable convolution for semantic image segmentation[C]//2018 European Conference on Computer Vision（ECCV），2018：833-851.
[23] ZHAO H，SHI J，QI X，et al.Pyramid scene parsing network[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition（CVPR），2017.
[24] BADRINARAYANAN V，KENDALL A，CIPOLLA R.Seg Net：a deep convolutional encoder-decoder architecture for image segmentation[J].IEEE Transactions on Pattern Analy-
sis and Machine Intelligence，2017，39（12）：2481-2495.
[25] HAN S，MAO H，DALLY W J.Deep compression：compressing deep neural networks with pruning，trained quantization and huffman coding[J].arXiv：1510.00149，2015.
[26] HAN S，POOL J，TRAN J，et al.Learning both weights and connections for efficient neural network[C]//Advances in Neural Information Processing Systems，2015.
[27] HASSIBI B，STORK D.Second order derivatives for network pruning：optimal brain surgeon[C]//Advances in Neural Information Processing Systems，1992.
[28] LECUN Y，DENKER J，SOLLA S.Optimal brain damage[C]//Advances in Neural Information Processing Systems，1989.
[29] LI H，KADAV A，DURDANOVIC I，et al.Pruning filters for efficient convnets[J].arXiv：1608.08710，2016.
[30] LI C，SHI C J.Constrained optimization based low-rank approximation of deep neural networks[C]//Proceedings of the European Conference on Computer Vision（ECCV），2018：732-747.
[31] WEN W，WU C，WANG Y，et al.Learning structured sparsity in deep neural networks[C]//Advances in Neural Information Processing Systems，2016.
[32] LUO J H，WU J，LIN W.Thinet：a filter level pruning method for deep neural network compression[C]//Proceedings of the IEEE International Conference on Computer Vision，2017：5058-5066.
[33] HAN S，LIU X，MAO H，et al.EIE：efficient inference engine on compressed deep neural network[J].ACM SIGARCH Computer Architecture News，2016，44（3）：243-254.
[34] HE Y，ZHANG X，SUN J.Channel pruning for accelerating very deep neural networks[C]//Proceedings of the IEEE International Conference on Computer Vision，2017：1389-1397.
[35] LIU Y，SHU C，WANG J，et al.Structured knowledge distillation for dense prediction[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，2020：1-10.
[36] BA J，CARUANA R.Do deep nets really need to be deep?[C]//Advances in Neural Information Processing Systems，2014.
[37] BUCILUǎ C，CARUANA R，NICULESCU-MIZIL A.Model compression[C]//Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining，2006：535-541.
[38] HINTON G，VINYALS O，DEAN J.Distilling the knowledge in a neural network[J].arXiv：1503.02531，2015.
[39] ROMERO A，BALLAS N，KAHOU S E，et al.Fitnets：hints for thin deep nets[J].arXiv：1412.6550，2014.
[40] CHEN W，GONG X，LIU X，et al.Fasterseg：searching for faster real-time semantic segmentation[J].arXiv：1912.10917，2019.
[41] GONG Y，LIU L，YANG M，et al.Compressing deep convolutional networks using vector quantization[J].arXiv：1412.6115，2014.
[42] ZHOU A，YAO A，GUO Y，et al.Incremental network quantization：towards lossless CNNs with low-precision weights[J].arXiv：1702.03044，2017.
[43] ZHOU S，WU Y，NI Z，et al.Dorefa-net：training low bitwidth convolutional neural networks with low bitwidth gradients[J].arXiv：1606.06160，2016.
[44] DENTON E L，ZAREMBA W，BRUNA J，et al.Exploiting linear structure within convolutional networks for efficient evaluation[C]//Advances in Neural Information Processing Systems，2014.
[45] JADERBERG M，VEDALDI A，ZISSERMAN A.Speeding up convolutional neural networks with low rank expansions[J].arXiv：1405.3866，2014.
[46] ANDRI R，CAVIGELLI L，ROSSI D，et al.YodaNN：an architecture for ultralow power binary-weight CNN acceleration[J].IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems，2017，37（1）：48-60.
[47] COURBARIAUX M，HUBARA I，SOUDRY D，et al.Binarized neural networks：training deep neural networks with weights and activations constrained to +1 or -1[J].arXiv：1602.02830，2016.
[48] HUBARA I，COURBARIAUX M，SOUDRY D，et al.Quantized neural networks：training neural networks with low precision weights and activations[J].The Journal of Machine Learning Research，2017，18（1）：6869-6898.
[49] RASTEGARI M，ORDONEZ V，REDMON J，et al.Xnor-net：imagenet classification using binary convolutional neural networks[C]//European Conference on Computer Vision.Cham：Springer，2016：525-542.
[50] SZEGEDY C，VANHOUCKE V，IOFFE S，et al.Rethinking the inception architecture for computer vision[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2016：2818-2826.
[51] ROMERA E，ALVAREZ J M，BERGASA L M，et al.Erfnet：efficient residual factorized convnet for real-time semantic segmentation[J].IEEE Transactions on Intelligent Transportation Systems，2017，19（1）：263-272.
[52] WANG Y，ZHOU Q，XIONG J，et al.ESNet：an efficient symmetric network for real-time semantic segmentation[C]//Chinese Conference on Pattern Recognition and Computer Vision（PRCV）.Cham：Springer，2019：41-52.
[53] LI Y，LI X，XIAO C，et al.EACNet：enhanced asymmetric convolution for real-time semantic segmentation[J].IEEE Signal Processing Letters，2021，28：234-238.
[54] LOU A，LOEW M.Cfpnet：channel-wise feature pyramid for real-time semantic segmentation[C]//2021 IEEE International Conference on Image Processing（ICIP），2021：1894-1898.
[55] YU F，KOLTUN V.Multi-scale context aggregation by dilated convolutions[J].arXiv：1511.07122，2015.
[56] SIFRE L，MALLAT S.Rigid-motion scattering for texture classification[J].arXiv：1403.1687，2014.
[57] ZHANG X，ZHOU X，LIN M，et al.Shufflenet：an extremely efficient convolutional neural network for mobile devices[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2018：6848-6856.
[58] GAMAL M，SIAM M，ABDEL-RAZEK M.Shuffleseg：real-time semantic segmentation network[J].arXiv：1803.03816，2018.
[59] WANG Y，ZHOU Q，LIU J，et al.Lednet：a lightweight encoder-decoder network for real-time semantic segmentation[C]//2019 IEEE International Conference on Image Processing（ICIP），2019：1860-1864.
[60] ZHUANG J，YANG J.ShelfNet for real-time semantic segmentation[J].arXiv：1811.11254，2018.
[61] ZHUANG J.LadderNet：multi-path networks based on U-Net for medical image segmentation[J].arXiv：1810.07810，2018.
[62] LIN M，CHEN Q，YAN S.Network in network[J].arXiv：1312.4400，2013.
[63] YU C，WANG J，PENG C，et al.Bisenet：bilateral segmentation network for real-time semantic segmentation[C]//Proceedings of the European Conference on Computer Vision（ECCV），2018：325-341.
[64] ZHAO H，QI X，SHEN X，et al.Icnet for real-time semantic segmentation on high-resolution images[C]//Proceedings of the European Conference on Computer Vision（ECCV），2018：405-420.
[65] YU C，GAO C，WANG J，et al.Bisenet v2：bilateral network with guided aggregation for real-time semantic segmentation[J].International Journal of Computer Vision，2021，129（11）：3051-3068.
[66] POUDEL R P K，LIWICKI S，CIPOLLA R.Fast-scnn：fast semantic segmentation network[J].arXiv：1902.04502，2019.
[67] POUDEL R P K，BONDE U，LIWICKI S，et al.Contextnet：exploring context and detail for semantic segmentation in real-time[J].arXiv：1805.04554，2018.
[68] XU Q，MA Y，WU J，et al.Faster BiSeNet：a faster bilateral segmentation network for real-time semantic segmentation[C]//2021 International Joint Conference on Neural Networks（IJCNN），2021：1-8.
[69] WANG F，LUO X Y，WANG Q X，et al.Aerial-BiSeNet：a real-time semantic segmentation network for high resolution aerial imagery[J].Chinese Journal of Aeronautics，2021，34（9）：47-59.
[70] NIRKIN Y，WOLF L，HASSNER T.Hyperseg：patch-wise hypernetwork for real-time semantic segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition，2021：4061-4070.
[71] YANG Z，YU H，FU Q，et al.NDNet：narrow while deep network for real-time semantic segmentation[J].IEEE Transactions on Intelligent Transportation Systems，2020，22（9）：5508-5519.
[72] GAO G，XU G，YU Y，et al.MSCFNet：a lightweight network with multi-scale context fusion for real-time semantic segmentation[J].IEEE Transactions on Intelligent Transportation Systems，2021：1-11.
[73] WU Y，JIANG J，HUANG Z，et al.FPANet：feature pyramid aggregation network for real-time semantic segmentation[J].Applied Intelligence，2022，52（3）：3319-3336.
[74] LIU M，YIN H.Feature pyramid encoding network for real-time semantic segmentation[J].arXiv：1909.08599，2019.
[75] LI H，XIONG P，FAN H，et al.Dfanet：deep feature aggregation for real-time semantic segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition，2019：9522-9531.
[76] MAZZINI D.Guided upsampling network for real-time semantic segmentation[J].arXiv：1807.07466，2018.
[77] TANG X，TU W，LI K，et al.DFFNet：an IoT-perceptive dual feature fusion network for general real-time semantic segmentation[J].Information Sciences，2021，565：326-343.
[78] KANG D，WONG A，LEE B，et al.Real-time semantic segmentation of 3D point cloud for autonomous driving[J].Electronics，2021，10（16）：1960.
[79] 袁旭亮，王娟，武明虎，等.基于注意力机制的航拍图像实时语义分割方法[J].激光杂志，2023，44（1）：122-129.
YUAN X L，WANG J，WU M H，et al.Real-time semantic segmentation method of aerial images based on attention mechanism[J].Laser Journal，2023，44（1）：122-129.
[80] 霍占强，贾海洋，乔应旭，等.边界感知的实时语义分割网络[J].计算机工程与应用，2022，58（17）：165-173.
HUO Z Q，JIA H Y，QIAO Y X，et al.Boundary-aware real-time semantic segmentation network[J].Computer Engineering and Applications，2022，58（17）：165-173.
[81] LIN T Y，MAIRE M，BELONGIE S，et al.Microsoft coco：common objects in context[C]//European Conference on Computer Vision.Cham：Springer，2014：740-755.
[82] ZHOU B，ZHAO H，PUIG X，et al.Scene parsing through ade20k dataset[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2017：633-641.
[83] CORDTS M，OMRAN M，RAMOS S，et al.The cityscapes dataset for semantic urban scene understanding[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2016：3213-3223.
[84] ROS G，SELLART L，MATERZYNSKA J，et al.The synthia dataset：a large collection of synthetic images for semantic segmentation of urban scenes[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2016：3234-3243.
[85] BROSTOW G J，FAUQUEUR J，CIPOLLA R.Semantic object classes in video：a high-definition ground truth database[J].Pattern Recognition Letters，2009，30（2）：88-97.
[86] GEIGER A，LENZ P，STILLER C，et al.Vision meets robotics：the kitti dataset[J].The International Journal of Robotics Research，2013，32（11）：1231-1237.
[87] GARCIA-GARCIA A，ORTS-ESCOLANO S，OPREA S，et al.A survey on deep learning techniques for image and video semantic segmentation[J].Applied Soft Computing，2018，70：41-65.