Survey of Evaluation Metrics and Methods for Semantic Segmentation

doi:10.3778/j.issn.1002-8331.2207-0139

Abstract

Abstract: Deep learning has made major breakthroughs in the field of semantic segmentation. Standard, well-known and comprehensive metrics should be used to evaluate the performance of these algorithms to ensure objectivity and effectiveness of the evaluation. Through summary of the existing semantic segmentation evaluation metrics, this paper elaborates from some aspects, e.g., pixel accuracy, depth estimation error metric, operation efficiency, memory demand and robustness. Especially, the widely used accuracy metrics such as F1 score, mIoU, mPA, Dice coefficient and Hausdorff distance are introduced in detail. In addition, this paper expounds the related research on the robustness and generalization. Furthermore, this paper points out the requirements in the semantic segmentation experiment and the limitations of segmentation quality evaluation.

Key words: semantic segmentation, evaluation metric, mean intersection over union（mIoU）, mean pixel accuracy（mPA）, robustness

摘要： 深度学习算法在语义分割领域已经取得大量突破，对这些算法的性能评估应选择标准、通用、全面的度量指标，以保证评价的客观性和有效性。通过对当前语义分割评价指标和度量方法进行归纳分析，从像素标记准确性、深度估计误差度量、执行效率、内存占用、鲁棒性等方面进行了多角度阐述，尤其对广泛应用的F1分数、mIoU、mPA、Dice系数、Hausdorff距离等准确性指标进行了详细介绍，并总结了提高分割网络鲁棒性的方法，指出了语义分割实验的要求和当前分割质量评价存在的问题。

关键词: 语义分割, 评价指标, 平均交并比（mIoU）, 平均像素精度（mPA）, 鲁棒性

YU Ying, WANG Chunping, FU Qiang, KOU Renke, WU Weiyi, LIU Tianyong. Survey of Evaluation Metrics and Methods for Semantic Segmentation[J]. Computer Engineering and Applications, 2023, 59(6): 57-69.

于营, 王春平, 付强, 寇人可, 吴巍屹, 刘天勇. 语义分割评价指标和评价方法综述[J]. 计算机工程与应用, 2023, 59(6): 57-69.

References

[1] MINAEE S，BOYKOV Y Y，PORIKLI F，et al.Image segmentation using deep learning：a survey[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，2022，44（7）：3523-3542.
[2] 张鑫，姚庆安，赵健，等.全卷积神经网络图像语义分割方法综述[J].计算机工程与应用，2022，58（8）：45-57.
ZHANG X，YAO Q A，ZHAO J，et al.Image semantic segmentation based on fully convolutional neural network[J].Computer Engineering and Applications，2022，58（8）：45-57.
[3] 王涛，王文举，蔡宇.基于深度学习的三维点云语义分割方法研究[J].计算机工程与应用，2021，57（23）：18-26.
WANG T，WANG W J，CAI Y.Research of deep learning-based semantic segmentation for 3D point cloud[J].Computer Engineering and Applications，2021，57（23）：18-26.
[4] GARCIA-GARCIA A，ORTS-ESCOLANO S，OPREA S，et al.A survey on deep learning techniques for image and video semantic segmentation[J].Applied Soft Computing，2018，70：41-65.
[5] OTSU N.A threshold selection method from gray-level histograms[J].IEEE Transactions on Systems Man & Cybernetics，2007，9（1）：62-66.
[6] SHAFARENKO L，PETROU H，KITTLER J.Histogram-based segmentation in a perceptually uniform color space[J].IEEE Transactions on Image Processing，1998，7（9）：1354-1358.
[7] NOCK R，NIELSEN F.Statistical region merging[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，2004，26（11）：1452-1458.
[8] KANUNGO T，MOUNT D M，NETANYAHU N S，et al.An efficient [k]-means clustering algorithm：analysis and implementation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，2002，24（7）：881-892.
[9] NAJMAN L，SCHMITT M.Watershed of a continuous function[J].Signal Processing，2014，38（1）：99-112.
[10] KASS M，WITKIN A，TERZOPOULOS D.Snakes：active contour models[J].International Journal of Computer Vision，1988，1（4）：321-331.
[11] BOYKOV Y，VEKSLER O，ZABIH R.Fast approximate energy minimization via graph cuts[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，2001，23（11）：1222-1229.
[12] PLATH N，TOUSSAINT M，NAKAJIMA S.Multi-class image segmentation using conditional random fields and global classification[C]//Proceedings of the International Conference on Machine Learning，2009：1-8.
[13] LI S Z.Modeling image analysis problems using Markov random fields[M]//Stochastic processes：modelling and simulation.[S.l.]：Elsevier，2003：473-513.
[14] GABAIX X.A sparsity-based model of bounded rationality[J].Quarterly Journal of Economics，2014，129（4）.
[15] DONG W，XIN L，LEI Z，et al.Sparsity-based image denoising via dictionary learning and structural clustering[C]//Proceedings of the Computer Vision and Pattern Recognition（CVPR），2011：457-464.
[16] LONG J，SHELHAMER E，DARRELL T.Fully convolutional networks for semantic segmentation[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition（CVPR），2015：7-12.
[17] RONNEBERGER O，FISCHER P，BROX T.U-Net：convolutional networks for biomedical image segmentation[C]//Proceedings of the Medical Image Computing and Computer-Assisted Intervention（MICCAI）.Cham：Springer International Publishing，2015.
[18] BADRINARAYANAN V，KENDALL A，CIPOLLA R.SegNet：a deep convolutional encoder-decoder architecture for image segmentation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，2017，39（12）：2481-2495.
[19] ZHAO H，SHI J，QI X，et al.Pyramid scene parsing network[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition（CVPR），2017：21-26.
[20] HE K，ZHANG X，REN S，et al.Deep residual learning for image recognition[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition（CVPR），2016：770-778.
[21] GHIASI G，FOWLKES C C.Laplacian pyramid reconstruction and refinement for semantic segmentation[C]//Proceedings of the Computer Vision-ECCV.Cham：Springer International Publishing，2016.
[22] YU F，KOLTUN V.Multi-scale context aggregation by dilated convolutions[C]//Proceedings of the International Conference on Learning Representations（ICLR），2016.
[23] CHEN L C，PAPANDREOU G，KOKKINOS I，et al.Semantic image segmentation with deep convolutional nets and fully connected CRFs[J].Computer Science，2014（4）：357-361.
[24] CHEN L C，PAPANDREOU G，KOKKINOS I，et al.DeepLab：semantic image segmentation with deep convolutional nets，atrous convolution，and fully connected CRFs[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，2018，40（4）：834-848.
[25] CHEN L C，PAPANDREOU G，SCHROFF F，et al.Rethinking atrous convolution for semantic image segmentation[J].arXiv：1706.05587，2017.
[26] CHEN L C，ZHU Y，PAPANDREOU G，et al.Encoder-decoder with atrous separable convolution for semantic image segmentation[C]//Proceedings of the European Conference on Computer Vision（ECCV）.Cham：Springer International Publishing，2018：833-851.
[27] VISIN F，ROMERO A，CHO K，et al.ReSeg：a recurrent neural network-based model for semantic segmentation[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops（CVPRW），2016.
[28] VISIN F，KASTNER K，CHO K，et al.ReNet：a recurrent neural network based alternative to convolutional networks[J].Computer Science，2015，25（7）：2983-2996.
[29] BYEON W，BREUEL T M，RAUE F，et al.Scene labeling with LSTM recurrent neural networks[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition（CVPR），2015.
[30] LIANG X，SHEN X，FENG J，et al.Semantic object parsing with graph LSTM[C]//Proceedings of the European Conference on Computer Vision（ECCV）.Cham：Springer International Publishing，2016.
[31] DOSOVITSKIY A，BEYER L，KOLESNIKOV A，et al.An image is worth 16x16 words：transformers for image recognition at scale[J].arXiv：2010.11929，2020.
[32] VASWANI A，SHAZEER N，PARMAR N，et al.Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems，Long Beach，California，USA，2017.
[33] LIU Z，LIN Y，CAO Y，et al.Swin transformer：hierarchical vision transformer using shifted windows[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision（ICCV），2021.
[34] 于营，杨婷婷，杨博雄.混淆矩阵分类性能评价及Python实现[J].现代计算机，2021（20）：70-73.
YU Y，YANG T T，YANG B X.Confusion matrix classification performance evaluation and Python implementation[J].Modern Computer，2021（20）：70-73.
[35] PILLAI I，FUMERA G，ROLI F.Designing multi-label classifiers that maximize F measures：state of the art[J].Pattern Recognition，2017，61：394-404.
[36] PEREIRA R B，PLASTINO A，ZADROZNY B，et al.Correlation analysis of performance measures for multi-label classification[J].Information Processing & Management，2018，54（3）：359-369.
[37] CAO Y，XU J，LIN S，et al.Gcnet：non-local networks meet squeeze-excitation networks and beyond[C]//2019 IEEE/CVF International Conference on Computer Vision Workshop（ICCVW），2020.
[38] ZHAO H，ZHANG Y，LIU S，et al.Psanet：point-wise spatial attention network for scene parsing[C]//Proceedings of the European Conference on Computer Vision（ECCV），2018：267-283.
[39] WANG X，GIRSHICK R，GUPTA A，et al.Non-local neural networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition（CVPR），2018：7794-7803.
[40] FU J，LIU J，TIAN H，et al.Dual attention network for scene segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition（CVPR），2019：3146-3154.
[41] YUAN Y，CHEN X，WANG J.Object-contextual representations for semantic segmentation[J].arXiv：1909.11065v6，
2019.
[42] ZHENG S，LU J，ZHAO H，et al.Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition（CVPR），2021：6881-6890.
[43] XIE E，WANG W，YU Z，et al.Segformer：simple and efficient design for semantic segmentation with transformers[J].arXiv：2105.15203，2021.
[44] YAN H，ZHANG C，WU M.Lawin transformer：improving semantic segmentation transformer with multi-scale representations via large window attention[J].arXiv：2201.
01615，2022.
[45] CHEN Z，DUAN Y，WANG W，et al.Vision transformer adapter for dense predictions[J].arXiv：2205.08534v1，2022.
[46] LIU H，ZHANG J，YANG K，et al.CMX：cross-modal fusion for RGB-X semantic segmentation with transformers[J].arXiv：2203.04838v2，2022.
[47] WANG Y，CHEN X，CAO L，et al.Multimodal token fusion for vision transformers[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition（CVPR），2022：12176-12185.
[48] WANG Y，HUANG W，SUN F，et al.Channel exchanging networks for multimodal and multitask dense image prediction[J].arXiv：2112.02252，2021.
[49] CHEN X，LIN K，WANG G，et al.Bi-directional cross-modality feature propagation with separation-and-aggregation gate for RGB-D semantic segmentation[C]//Proceedings of the European Conference on Computer Vision（ECCV），2020：561-577.
[50] CAO J，LENG H，LISCHINSKI D，et al.ShapeConv：shape-aware convolutional layer for indoor RGB-D semantic segmentation[J].arXiv：2108.10528，2021.
[51] XIONG Z，YUAN Y，GUO N，et al.Variational context-deformable convnets for indoor scene parsing[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition（CVPR），2020.
[52] WANG W，NEUMANN U.Depth-aware CNN for RGB-D segmentation[C]//Proceedings of the European Conference on Computer Vision（ECCV）.Cham：Springer International Publishing，2018.
[53] LEE S，PARK S J，HONG K S.RDFNet：RGB-D multi-level residual feature fusion for indoor semantic segmentation[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision（ICCV），2017.
[54] LIN G，MILAN A，SHEN C，et al.RefineNet：multi-path refinement networks for high-resolution semantic segmentation[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition（CVPR），2017.
[55] ZOU K H，WARFIELD S K，BHARATHA A，et al.Statistical validation of image segmentation quality based on a spatial overlap index[J].Academic Radiology，2004，11（2）：178-189.
[56] ANUAR N，MD SULTAN A B.Validate conference paper using dice coefficient[J].Computer and Information Science，2010，3（3）.
[57] KLEIN S，VAN DER HEIDE U A，LIPS I M，et al.Automatic segmentation of the prostate in 3D MR images by atlas matching using localized mutual information[J].Medical Physics，2008，35（4）：1407-1417.
[58] MILLETARI F，NAVAB N，AHMADI S A.V-Net：fully convolutional neural networks for volumetric medical image segmentation[J].arXiv：1606.04797，2016.
[59] GHOSAL S，XIE A，SHAH P.Uncertainty quantified deep learning for predicting dice coefficient of digital histopathology image segmentation[J].arXiv：2109.00115，2021.
[60] WANG J，HUANG Q，TANG F，et al.Stepwise feature fusion：local guides global[J].arXiv：2203.03635，2022.
[61] SRIVASTAVA A，JHA D，CHANDA S，et al.MSRF-Net：a multi-scale residual fusion network for biomedical image segmentation[C]//Proceedings of the IEEE Journal of Biomedical and Health Informatics，2022：2252-2263.
[62] TOMAR N K，JHA D，RIEGLER M A，et al.FANet：a feedback attention network for improved biomedical image segmentation[J].arXiv：2103.17235v2，2021.
[63] XU Q，DUAN W，HE N.DCSAU-Net：a deeper and more compact split-attention U-Net for medical image segmentation[J].arXiv：2202.00972，2022.
[64] JHA D，RIEGLER M A，JOHANSEN D，et al.DoubleU-Net：a deep convolutional neural network for medical image segmentation[J].arXiv：2006.04868v2，2020.
[65] ZHOU Z，SIDDIQUEE R，TAJBAKHSH N，et al.UNet++：a nested U-Net architecture for medical image segmentation[M]//Deep learning in medical image analysis and multimodal learning for clinical decision support，part of the lecture notes in computer science book series（LNIP）.Cham：Springer，2018：3-11.
[66] DOU Q，YU L，CHEN H，et al.3D deeply supervised network for automated segmentation of volumetric medical images[J].Medical Image Analysis，2017，41：40-54.
[67] NIKOLOV S，BLACKWELL S，ZVEROVITCH A，et al.Clinically applicable segmentation of head and neck anatomy for radiotherapy：deep learning algorithm development and validation study[EB/OL].（2020-11-30）.DOI：10.2196/preprints.26151.
[68] TAHA A A，HANBURY A.Metrics for evaluating 3D medical image segmentation：analysis，selection，and tool[J].BMC Medical Imaging，2015，15（29）.
[69] KLINE D M，BERARDI V L.Revisiting squared-error and cross-entropy functions for training neural network classifiers[J].Neural Computing and Applications，2005，14（4）：310-318.
[70] NASR G E，BADR E A，JOUN C.Cross entropy error function in neural networks：forecasting gasoline demand[C]//Proceedings of the Fifteenth International Florida Artificial Intelligence Research Society Conference，2002：381-384.
[71] LIN T Y，GOYAL P，GIRSHICK R，et al.Focal loss for dense object detection[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision（ICCV），2017.
[72] YU J，JIANG Y，WANG Z，et al.UnitBox：an advanced object detection network[C]//Proceedings of the 24th ACM International Conference on Multimedia，Amsterdam，Association for Computing Machinery，2016：516-520.
[73] SALEHI S S M，ERDOGMUS D，GHOLIPOUR A.Tversky loss function for image segmentation using 3D fully convolutional deep networks[C]//Proceedings of the Machine Learning in Medical Imaging.Cham：Springer International Publishing，2017.
[74] BERMAN M，RANNEN A，BLASCHKO M.The Lovasz-Softmax loss：a tractable surrogate for the optimization of the intersection-over-union measure in neural networks[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition（CVPR），2018：4413-4421.
[75] PASZKE A，CHAURASIA A，KIM S，et al.ENet：a deep neural network architecture for real-time semantic segmentation[J].arXiv：1606.02147，2016.
[76] CAO Y J，WU S，LIU C，et al.Seg-CapNet：a capsule-based neural network for the segmentation of left ventricle from cardiac magnetic resonance imaging[J].Journal of Computer Science & Technology，2021，36（2）：323-333.
[77] BORSE S，WANG Y，ZHANG Y，et al.InverseForm：a loss function for structured boundary-aware segmentation[J].arXiv：2104.02745，2021.
[78] ROMERA E，ALVAREZ J M，BERGASA L M，et al.ERFNet：efficient residual factorized convnet for real-time semantic segmentation[J].IEEE Transactions on Intelligent Transportation Systems，2017（1）：1-10.
[79] OR?I? M，KRE?O I，BEVANDI? P，et al.In defense of pretrained ImageNet architectures for real-time semantic segmentation of road-driving images[C]//Proceedings of 2019 IEEE Conference on Computer Vision and Pattern Recognition（CVPR），2019：12607-12616.
[80] KALODIMAS P，NIKITAKIS A，PAPAEFSTATHIOU I.An open-source high-throughput，reduced memory footprint，face detection，pose estimation and landmark localization system[C]//Proceedings of the 2019 22nd Euromicro Conference on Digital System Design（DSD），2019.
[81] CHEN C，DOU Q，CHEN H，et al.Semantic-aware generative adversarial nets for unsupervised domain adaptation in chest X-Ray segmentation[C]//Proceedings of the Machine Learning in Medical Imaging.Cham：Springer International Publishing，2018.
[82] MOOSAVI-DEZFOOLI S，FAWZI A，FROSSARD P.DeepFool：a simple and accurate method to fool deep neural networks[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition（CVPR），2016：2574-2582.
[83] PAPERNOT N，MCDANIEL P，JHA S，et al.The limitations of deep learning in adversarial settings[C]//Proceedings of the 2016 IEEE European Symposium on Security and Privacy（EuroS&P），2016：372-387.
[84] CARLINI N，WAGNER D.Towards evaluating the robustness of neural networks[C]//Proceedings of the 2017 IEEE Symposium on Security and Privacy（SP），2017：39-57.
[85] KURAKIN A，GOODFELLOW I，BENGIO S.Adversarial machine learning at scale[J].arXiv：1611.01236，2016.
[86] ARNAB A，MIKSIK O，TORR P H S.On the robustness of semantic segmentation models to adversarial attacks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，2020，42（12）：3040-3053.