Image Semantic Segmentation Based on Fully Convolutional Neural Network

doi:10.3778/j.issn.1002-8331.2109-0091

Abstract

Abstract: Image semantic segmentation is a hot research topic in the field of computer vision. With the rapid rise of fully convolutional neural networks, the development of fusion of image semantic segmentation and fully convolutional networks has shown very bright results. Through the collection of high-quality literature in recent years, the focus is on the summary of full convolutional neural network image semantic segmentation methods. The collected literature is divided into classical semantic segmentation, real-time semantic segmentation and RGBD semantic segmentation according to the application scenarios, and then the representative segmentation methods are described. Commonly used public datasets and evaluation metrics for performance are also summarized, and experiments on commonly used datasets are analyzed and summarized. Finally, the possible future research directions of fully convolutional neural networks are prospected.

Key words: image semantic segmentation, computer vision, fully convolutional neural network

摘要： 图像语义分割是计算机视觉领域的热点研究课题，随着全卷积神经网络的迅速兴起，图像语义分割和全卷积神经网络的融合发展取得了非常卓越的成绩。通过对近年来高质量文献的收集，重点对全卷积神经网络图像语义分割方法进行总结。将收集的文献，按照应用场景的不同，划分为经典语义分割、实时性语义分割和RGBD语义分割，对具有代表性的分割方法进行阐述。同时归纳了常用的公共数据集和性能的评价指标，并对常用数据集上的实验进行分析总结，对全卷积神经网络未来可能的研究方向进行展望。

关键词: 图像语义分割, 计算机视觉, 全卷积神经网络

ZHANG Xin, YAO Qing’an, ZHAO Jian, JIN Zhenjun, FENG Yuncong. Image Semantic Segmentation Based on Fully Convolutional Neural Network[J]. Computer Engineering and Applications, 2022, 58(8): 45-57.

张鑫, 姚庆安, 赵健, 金镇君, 冯云丛. 全卷积神经网络图像语义分割方法综述[J]. 计算机工程与应用, 2022, 58(8): 45-57.

References

[1] 汪海洋，潘德炉，夏德深.二维Otsu自适应阈值选取算法的快速实现[J].自动化学报，2007，33（9）：968-971.
WANG H Y，PAN D L，XIA D S.A fast algorithm for two-dimensional otsu adaptive threshold algorithm[J].Acta Automatica Sinica，2007，33（9）：968-971.
[2] PUN T.A new method for gray-level picture thresholding using the entropy of the histogram[J].Signal Processing，1985，2（3）：223-237.
[3] OTSU N.A threshold selection method from gray-level histograms[J].IEEE Transactions on Systems Man，and Cybernetics，2007，9（1）：62-66.
[4] YEN J C，CHANG F J，CHANG S.A new criterion for automatic multilevel thresholding[J].IEEE Transactions on Image Processing，1995，4（3）：370-378.
[5] DERICHE R.Using Canny’s criteria to derive a recursively implemented optimal edge detector[J].International Journal of Computer Vision，1987，1（2）：167-187.
[6] ROSENFELD A.The max Roberts operator is a Hueckel-type edge detector[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，1981，3（1）：101-103.
[7] YANG L，WU X Y，ZHAO D W，et al.An improved Prewitt algorithm for edge detection based on noised image[C]//International Congress on Image and Signal Processing.New York：IEEE Press，2011：1197-1200.
[8] BOWYER K，KRANENBURG C，DOUGHERTY S.Edge detector evaluation using empirical ROC curves[J].Comput Vision & Image Understand，2001，84（1）：77-103.
[9] COATES A，NG A Y.Learning feature representations with K-means[J].Lecture Notes in Computer Science，2012，7700：561-580.
[10] CHENG Y.Mean shift，mode seeking，and clustering[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，1995，17（8）：790-799.
[11] FUKUNAGA K，HOSTETLER L.The estimation of the gradient of a density function，with applications in pattern recognition[J].IEEE Transactions on Information Theory，2006，21（1）：32-40.
[12] ACHANTA R，SHAJI A，SMITH K，et al.SLIC superpixels compared to state-of-the-art superpixel methods[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，2012，34（11）：2274-2282.
[13] HAN S，TAO W，WANG D，et al.Image segmentation based on GrabCut framework integrating multiscale nonlinear structure tensor[J].IEEE Transactions on Image Processing，2009，18（10）：2289-2302.
[14] TANG M，GORELICK L，VEKSLERR O，et al.GrabCut in one cut[C]//IEEE International Conference on Computer Vision，2013：1769-1776.
[15] BOYKOV Y Y.Interactive graph cuts for optimal boundary & region segmentation of objects in N-D images[C]//Proc Eighth IEEE International Conference on Computer Vision，2001：105-112.
[16] ROTHER C.GrabCut：interactive foreground extraction using iterated graph cuts[J].ACM Transactions on Graphics，2004，23（3）：309-314.
[17] HINTON G E，SALAKHUTDINOV R R.Reducing the dimensionality of data with neural networks[J].Science，2006，313（5786）：504-507.
[18] LONG J，SHELHAMER E，DARRELL T.Fully convo- lutional networks for semantic segmentation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，2015，39（4）：640-651.
[19] GARCIA-GARCIA A，ORTS-ESCOLANO S，OPREA S，et al.A review on deep learning techniques applied to semantic segmentation[J].arXiv：1704.06857，2017.
[20] 黄鹏，郑淇，梁超.图像分割方法综述[J].武汉大学学报（理学版），2020，66（6）：519-531.
HUANG P，ZHENG Q，LIANG C.Overview of image segmentation methods[J].Journal of Wuhan University（Natural Science Edition），2020，66（6）：519-531.
[21] 田萱，王亮，丁琪.基于深度学习的图像语义分割方法综述[J].软件学报，2019，30（2）：440-468.
TIAN X，WANG L，DING Q.Review of image semantic segmentation based on deep learning[J].Journal of Software，2019，30（2）：440-468.
[22] 章琳，袁非牛，张文睿，等.全卷积神经网络研究综述[J].计算机工程与应用，2020，56（1）：25-37.
ZHANG L，YUAN F N，ZHANG W R，et al.Review of fully convolutional neural network[J].Computer Engineering and Applications，2020，56（1）：25-37.
[23] 徐辉，祝玉华，甄彤，等.深度神经网络图像语义分割方法综述[J].计算机科学与探索，2021，15（1）：47-59.
XU H，ZHU Y H，ZHEN T，et al.Survey of image semantic segmentation methods based on deep neural network[J].Journal of Frontiers of Computer Science and Technology，2021，15（1）：47-59.
[24] KRIZHEVSKY A，SUTSKEVER I，HINTON G.ImageNet classification with deep convolutional neural networks[C]//Advances in Neural Information Processing Systems，2012：1097-1105.
[25] SIMONYAN K，ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[J].arXiv：1409.
1556，2014.
[26] SZEGEDY C，LIU W，JIA Y，et al.Going deeper with convolutions[J].arXiv：1409.4842，2014.
[27] HE K，ZHANG X，REN S，et al.Deep residual learning for image recognition[C]//Proceedings of International Conference on Computer Vision and Pattern Recognition，2016：770-778.
[28] RONNEBERGER O，FISCHER P，BROX T.U-net：convolutional networks for biomedical image segmentation[C]//Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention，2015：234-241.
[29] BADRINARAYANAN V，KENDALL A，CIPOLLA R.SegNet：a deep convolutional encoder-decoder architecture for image segmentation[J].IEEE Transactions on Pattern Analysis & Machine Intelligence，2017，39（12）：2481-2495.
[30] CHEN L C，PAPANDREOU G，KOKKINOS I，et al.Semantic image segmentation with deep convolutional nets and fully connected CRFs[J].arXiv：1606.00915，2016.
[31] CHEN L C，PAPANDREOU G，KOKKINOS I，et al.DeepLab：semantic image segmentation with deep convolutional nets，atrous convolution，and fully connected CRFs[J].IEEE Transactions on Pattern Analysis and Machine Intelligence & Machine Intelligence，2018，40（4）：834-848.
[32] CHEN L C，PAPANDREOU G，SCHROFF F，et al.Rethinking atrous convolution for semantic image segmentation[J].arXiv：1706.05587，2017.
[33] CHEN L C，ZHU Y，PPAPNDRROU G，et al.Encoder-decoder with atrous separable convolution for semantic image segmentation[C]//Proceedings of the European Conference on Computer Vision，2018：801-818.
[34] PASZKE A，CHAURASIA A，KIM S，et al.ENet：a deep neural network architecture for real-time semantic segmentation[J].arXiv：1606.02147，2016.
[35] CHAURASIA A，CULURCIELLO E.Linknet：exploiting encoder representations for efficient semantic segmentation[C]//Proceedings of the IEEE Visual Communications and Image Processing，2017：1-4.
[36] YU C，WANG J，PENG C，et al.BiSeNet：bilateral segmentation network for real-time semantic segmentation[M].Berlin，Germany：Springer，2018：334-349.
[37] LI H，XIONG P，FAN H，et al.DFANet：deep feature aggregation for real-time semantic segmentation[C]//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition，2019：9522-9531.
[38] JIANG J，ZHENG L，LUO F，et al.RedNet：residual encoder-decoder network for indoor RGB-D semantic segmentation[J].arXiv：1806.01054，2018.
[39] PARK S J，HONG K S，LEE S.RDFNet：RGB-D multi-level residual feature fusion for indoor semantic segmentation[C]//IEEE International Conference on Computer Vision，2017：4980-4989.
[40] ZHOU Z，SIDDIQUEE M，TAJBAKHSH N，et al.UNet++：a nested U-Net architecture for medical image segmentation[C]//4th Deep Learning in Medical Image Analysis（DLMIA） Workshop，2018.
[41] HUANG H，LIN L，TONG R，et al.UNet 3+：a full-scale connected UNet for medical image segmentation[C]//2020 IEEE International Conference on Acoustics，Speech and Signal Processing（ICASSP），2020.
[42] OKTAY O，SCHLEMPER J，FOLGOC L L，et al.Attention U-Net：learning where to look for the pancreas[J].arXiv：1804.03999，2018.
[43] 孟俊熙，张莉，曹洋，等.基于Deeplab v3+的图像语义分割算法优化研究[J/OL].激光与光电子学进展：1-15[2021-08-10].http：//kns.cnki.net/kcms/detail/31.1690.TN.20210716.
1534.006.html.
MENG J X，ZHANG L，CAO Y，et al.Research on optimization of image semantic segmentation algorithms based on Deeplab v3+[J/OL].Laser & Optoelectronics Progress：1-15[2021-08-10].http：//kns.cnki.net/kcms/detail/31.1690.TN.20210716.1534.006.html.
[44] 赵小强，徐慧萍.分级特征融合的图像语义分割[J].计算机科学与探索，2021，15（5）：949-957.
ZHAO X Q，XU H P.Image semantic segmentation method with hierarchical feature fusion[J].Journal of Frontiers of Computer Science and Technology，2021，15（5）：949-957.
[45] LIN G，MILAN A，SHEN C，et al.Refinenet：multi-path refinement networks for high-resolution semantic segmen-tation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2017：1925-1934.
[46] ZHAO H，SHI J，QI X，et al.Pyramid scene parsing network[C]//Proceedings of International Conference on Computer Vision and Pattern Recognition，2017：6230-6239.
[47] PENG C，ZHANG X，YU G，et al.Large kernel matters-improve semantic segmentation by global convolutional network[C]//Proceedings of International Conference on Computer Vision and Pattern Recognition，2017：1743-1751.
[48] YU C，WANG J，PENG C，et al.Learning a discriminative feature network for semantic segmentation[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition（CVPR），2018.
[49] QUAN T M，HILDEBRAND D，JEONG W K.FusionNet：a deep fully residual convolutional neural network for image segmentation in connectomics[J].arXiv：1614.05360，
2016.
[50] NOH H，HONG S，HAN B.Learning deconvolution network for semantic segmentation[C]//Proceedings of the IEEE International Conference on Computer Vision，2015：1520-1528.
[51] GADDE R，JAMPANI V，GEHLER P V.Semantic video CNNs through representation warping[J].arXiv：1708.03088，
2017.
[52] NILSSON D，SMINCHISESCU C.Semantic video segmentation by gated recurrent flow propagation[J].arXiv：1612.08871，2016.
[53] JIN X，LI X，XIAO H，et al.Video scene parsing with predictive feature learning[J].arXiv：1612.00119，2016.
[54] RADFORD A，METZ L，CHINTALA S.Unsupervised representation learning with deep convolutional generative adversarial networks[J].arXiv：1511.06434，2015.
[55] FAN M，LAI S，HUANG J，et al.Rethinking BiSeNet for real-time semantic segmentation[J].arXiv：2104.13188，2021.
[56] LYU H，FU H，HU X，et al.Esnet：edge-based segmentation network for real-time semantic segmentation in traffic scenes[C]//2019 IEEE International Conference on Image Processing（ICIP），2019：1855-1859.
[57] FANG Q，QIU J，WU H，et al.DFPNet：dislocation double feature pyramid real-time semantic segmentation network[C]//2020 Chinese Automation Congress（CAC），2020：2587-2592.
[58] NEKRASOV V，SHEN C，REID I.Light-weight RefineNet for real-time semantic segmentation[J].arXiv：1810.03272，
2018.
[59] HINTON G，VINYALS O，DEAN J.Distilling the knowledge in a neural network[J].arXiv：1503.02531，2015.
[60] ELSKEN T，METZEN J H，HUTTER F.Neural architecture search：a survey[J].arXiv：1808.05377，2018.
[61] XING Y，WANG J，CHEN X，et al.Coupling two-stream RGB-D semantic segmentation network by idempotent mappings[C]//2019 IEEE International Conference on Image Processing（ICIP），2019：1850-1854.
[62] XING Y，WANG J，CHEN X，et al.2.5D convolution for RGB-D semantic segmentation[C]//2019 IEEE International Conference on Image Processing（ICIP），2019：1410-1414.
[63] HU X，YANG K，FEI L，et al.ACNet：attention based network to exploit complementary features for RGBD semantic segmentation[C]//IEEE International Conference on Image Processing，2019：1440-1444.
[64] SHI W，ZHU D，ZHANG G，et al.Multilevel cross-aware RGBD semantic segmentation of indoor environments[C]//2019 IEEE International Conference on Cyborg and Bionic Systems（CBS），2019：382-390.
[65] LI Y，ZHANG J，CHENG Y，et al.Semantics-guided multi-level RGB-D feature fusion for indoor semantic segmentation[C]//2017 IEEE International Conference on Image Processing（ICIP），2018：1262-1266.
[66] EIGEN D，FERGUS R.Predicting depth，surface normals and semantic labels with a common multi-scale convolutional architecture[C]//2015 IEEE International Conference on Computer Vision（ICCV），2014.
[67] MANCINI M，COSTANTE G，VALIGI P，et al.Fast robust monocular depth estimation for obstacle detection with fully convolutional networks[J].arXiv：1607.06349，2016.
[68] HU X，YANG K，FEI L，et al.ACNet：attention based network to exploit complementary features for RGBD semantic segmentation[J].arXiv：1905.10089，2019.
[69] CHEN S，ZHU X，LIU W，et al.Global-local propagation network for RGB-D semantic segmentation[J].arXiv：2101.
10801，2021.
[70] CHEN X，LIN K Y，WANG J，et al.Bi-directional cross-modality feature propagation with separation-and-aggregation gate for RGB-D semantic segmentation[J].arXiv：2007.
09183，2020.
[71] QI X，LIAO R，JIA J，et al.3D graph neural networks for RGBD semantic segmentation[C]//2017 IEEE International Conference on Computer Vision（ICCV），2017.
[72] EVERINGHAM M，ESLAMI S，GOOL L V，et al.The pascal visual object classes challenge：a retrospective[J].International Journal of Computer Vision，2015，111（1）：98-136.
[73] MOTTAGHI R，CHEN X，LIU X，et al.The role of context for object detection and semantic segmentation in the wild[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition，2014：891-898.
[74] CHEN X，MOTTAGHI R，LIU X，et al.Detect what you can：detecting and representing objects using holistic models and body parts[C]//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition.Washington DC，USA：IEEE Press，2014：1971-1978.
[75] LIN T Y，MAIRE M，BELONGIE S，et al.Microsoft COCO：common objects in context[C]//Proceedings of the European Conference on Computer Vision，2014：740-755.
[76] CORDTS M，OMRAN M，RAMOUS S，et al.The cityscapes dataset for semantic urban scene understanding[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2016：3213-3223.
[77] BROSTOW G J，SHOTTON J，FAUQUEUR J，et al.Segmentation and recognition using structure from motion point clouds[C]//Proceedings of the European Conference on Computer Vision，2008：44-57.
[78] GEIGER A，LENZ P，STILLER C，et al.Vision meets robotics：the KITTI dataset[J].The International Journal of Robotics Research，2013，32（11）：1231-1237.
[79] ALVAREZ J M，GEVERS T，LECUN Y，et al.Road scene segmentation from a single image[C]//European Conference on Computer Vision，2012：376-389.
[80] ROS G，ALVAREZ J M.Unsupervised image transformation for outdoor semantic labelling[C]//2015 IEEE Intelligent Vehicles Symposium（IV），2015：537-542.
[81] ZHANG R，CANDRA S A，KAI V，et al.Sensor fusion for semantic segmentation of urban scenes[C]//IEEE International Conference on Robotics & Automation，2015：1850-1857.
[82] ROS G，RAMOS S，GRANADOS M，et al.Vision-based offline-online perception paradigm for autonomous driving[C]//2015 IEEE Winter Conference on Applications of Computer Vision，2015：231-238.
[83] LIU C，YUEN J，TORRALBA A.Nonparametric scene parsing：label transfer via dense scene alignment[C]//IEEE Conference on Computer Vision and Pattern Recognition，2009：1972-1979.
[84] GOULD S，FULTON R，KOLLER D.Decomposing a scene into geometric and semantically consistent regions[C]//IEEE International Conference on Computer Vision，2009：1-8.
[85] SILBERMAN N，HOIEM D，KOHLI P，et al.Indoor segmentation and support inference from RGBD images[C]//Proceedings of European Conference on Computer Vision，2012：746-760.
[86] XIAO J，OWENS A H，TORRALBA A.SUN3D：a database of big spaces reconstructed using SfM and object labels[C]//2013 IEEE International Conference on Computer Vision（ICCV），2013：1625-1632.
[87] SONG S，LICHTENBERG S P，XIAO J.SUN RGB-D：a RGB-D scene understanding benchmark suite[C]//IEEE Conference on Computer Vision & Pattern Recognition，2015：567-576.
[88] LAI K，BO L，REN X，et al.A large-scale hierarchical multi-view RGB-D object dataset[C]//IEEE International Conference on Robotics & Automation，2011：1817-1824.
[89] GARCIA-GARCIA A，ORTS-ESCOLANO S，OPREA S，et al.A review on deep learning techniques applied to semantic segmentation[J].arXiv：1704.06857，2017.