Lightweight Semantic Segmentation Neural Network for Autonomous Driving

doi:10.3778/j.issn.1002-8331.2202-0092

Abstract

Abstract: Image semantic segmentation has very important applications in autonomous driving, allowing robots to segment semantic information in the environment to make decisions about downstream control actions. However, most of the deep learning models for this task are relatively large, require huge computing resources, and are difficult to use in mobile devices. In order to solve this problem, a lightweight neural network model for semantic segmentation is proposed, which uses a network architecture combining encoding-decoding and two-branch type. Grouping convolution, deep separable convolution, multi-scale feature fusion module and channel shuffling technology are used to reduce the number of network parameters and improve the prediction accuracy of the model. The model training in this paper combines Adam training method and stochastic gradient descent method. The Cityscapes data set is used, and 1000 training cycles are set. After testing, the number of model parameters is 3.5×106, and the calculation speed on a single graphics card GTX 1070Ti is 103 frames per second, which meets the real-time calculation standard. In the model evaluation indicators, the average intersection ratio is 61.3%, and the pixel accuracy rate is 93.4%, both of which are better than SegNet and ENet models.

Key words: autonomous driving, deep learning, semantic segmentation, lightweight neural network, deep separable convolution

摘要： 图像语义分割在自动驾驶领域有十分重要的应用，可以让机器人在环境中分割出语义信息，从而对下游的控制动作做出决策。但大部分的深度学习模型都比较大，需庞大的计算资源，很难在移动设备中使用。为了解决这个问题，提出了一种用于语义分割的轻量级神经网络模型，采用编码-解码型与二分支型相结合的网络架构，利用分组卷积、深度可分离卷积、多尺度特征融合模块与通道混洗技术减少网络参数量，提升模型预测精度。该模型训练结合Adam训练法与随机梯度下降法，使用Cityscapes数据集，设置1?000个训练周期。经测试，该模型参数量为3.5×106，在单张显卡Nvidia GTX 1070Ti上的运算速度为每秒103帧图片，达到实时计算标准。在模型评估指标中，平均交并比为61.3%，像素准确率为93.4%，性能均优于SegNet和ENet两种模型。

关键词: 自动驾驶, 深度学习, 语义分割, 轻量级神经网络, 深度可分离卷积

XU Guobao, MAI Ruitao, YE Changxin, YAO Xu, LIU Mingxin. Lightweight Semantic Segmentation Neural Network for Autonomous Driving[J]. Computer Engineering and Applications, 2023, 59(10): 328-334.

徐国保, 麦锐滔, 叶昌鑫, 姚旭, 刘洺辛. 用于自动驾驶的轻量级语义分割神经网络[J]. 计算机工程与应用, 2023, 59(10): 328-334.

References

[1] 黄凯奇，任伟强，谭铁牛.图像物体分类与检测算法综述[J].计算机学报，2014，37（6）：1225-1240.
HUANG K Q，REN W Q，TAN T N，et al.A review on image object classification and detection[J].Chinese Journal of Computers，2014，37（6）：1225-1240.
[2] 张蕊，李锦涛.基于深度学习的场景分割算法研究综述[J].计算机研究与发展，2020，57（4）：859-875.
ZHANG R，LI J T.A survey on algorithm research of scene parsing based on deep learning[J].Journal of Computer Research and Development，2020，57（4）：859-875.
[3] 张静，靳淇兆，王洪振，等.多尺度信息融合的遥感图像语义分割模型[J].计算机辅助设计与图形学学报，2019，31（9）：45-53.
ZHANG J，JIN Q Z，WANG H Z，et al.Semantic segmentation on remote sensing images with multi-scale feature fusion[J].Journal of Computer-Aided Design & Computer Graphics，2019，31（9）：45-53.
[4] LONG J，SHELHAMER E，DARRELL T.Fully convolutional networks for semantic segmentation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，2015，39（4）：640-651.
[5] RONNEBERGER O，FISCHER P，BROX T.U-Net：convolutional networks for biomedical image segmentation[C]//International Conference on Medical Image Computing and Computer-Assisted Intervention.Cham：Springer，2015：234-241.
[6] HOWARD A G，ZHU M，CHEN B，et al.MobileNets：efficient convolutional neural networks for mobile vision applications[J].arXiv：1704.04861，2017.
[7] SANDLER M，HOWARD A，ZHU M，et al.MobileNetv2：inverted residuals and linear bottlenecks[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition，2018：4510-4520.
[8] ZHANG X，ZHOU X，LIN M，et al.ShuffleNet：an extremely efficient convolutional neural network for mobile devices[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition，2018：6848-6856.
[9] LIN P，SUN P，CHENG G，et al.Graph-guided architecture search for real-time semantic segmentation[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition，2020：4203-4212.
[10] YU C，WANG J，PENG C，et al.BiSeNet：bilateral segmentation network for real-time semantic segmentation[C]//Proceedings of the 15th European Conference on Computer Vision，2018：325-341.
[11] YU C，GAO C，WANG J，et al.BiSeNet v2：bilateral network with guided aggregation for real-time semantic segmentation[J].International Journal of Computer Vision，2021，129（11）：3051-3068.
[12] FAN M，LAI S，HUANG J，et al.Rethinking BiSeNet for real-time semantic segmentation[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition，2021：9716-9725.
[13] KRIZHEVSKY A，SUTSKEVER I，HINTON G E.Image-
Net classification with deep convolutional neural networks[C]//Advances in Neural Information Processing Systems 25，2012：1097-1105.
[14] XIE S，GIRSHICK R，DOLLAR P，et al.Aggregated residual transformations for deep neural networks[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition，2017：1492-1500.
[15] CORDTS M，OMRAN M，RAMOS S，et al.The cityscapes dataset for semantic urban scene understanding[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition，2016：3213-3223.
[16] KINGMA D P，BA J.Adam：a method for stochastic optimization[J].arXiv：1412.6980，2014.
[17] POUDEL R P K，BONDE U，LIWICKI S，et al.Context-
Net：exploring context and detail for semantic segmentation in real-time[J].arXiv：1805.04554，2018.
[18] BADRINARAYANAN V，KENDALL A，CIPOLLA R.SegNet：a deep convolutional encoder-decoder architecture for image segmentation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，2017，39（12）：2481-2495.
[19] PASZKE A，CHAURASIA A，KIM S，et al.ENet：a deep neural network architecture for real-time semantic segmentation[J].arXiv：1606.02147，2016.
[20] POUDEL R P K，LIWICKI S，CIPOLLA R.Fast-SCNN：fast semantic segmentation network[J].arXiv：1902.04502，2019.
[21] ZHAO H，SHI J，QI X，et al.Pyramid scene parsing network[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition，2017：2881-2890.
[22] SHRIVASTAVA A，GUPTA A，GIRSHICK R.Training region-based object detectors with online hard example mining[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition，2016：761-769.
[23] ZHAO H，QI X，SHEN X，et al.ICNet for real-time semantic segmentation on high-resolution images[C]//Proceedings of the 15th European Conference on Computer Vision，2018：405-420.
[24] LI H，XIONG P，FAN H，et al.DFANet：deep feature agg-
regation for real-time semantic segmentation[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition，2019：9522-9531.
[25] CHEN W，GONG X，LIU X，et al.FasterSeg：searching for faster real-time semantic segmentation[C]//Proceedings of the 8th International Conference on Learning Representations，Addis Ababa，2020：1-14.
[26] LI X，YOU A，ZHU Z，et al.Semantic flow for fast and accurate scene parsing[C]//Proceedings of the 16th European Conference on Computer Vision.Cham：Springer，2020：775-793.