计算机工程与应用 ›› 2023, Vol. 59 ›› Issue (10): 328-334.DOI: 10.3778/j.issn.1002-8331.2202-0092

• 工程与应用 • 上一篇    

用于自动驾驶的轻量级语义分割神经网络

徐国保,麦锐滔,叶昌鑫,姚旭,刘洺辛   

  1. 1.广东海洋大学 数学与计算机学院,广东 湛江 524088
    2.广东海洋大学 电子与信息工程学院,广东 湛江 524088
  • 出版日期:2023-05-15 发布日期:2023-05-15

Lightweight Semantic Segmentation Neural Network for Autonomous Driving

XU Guobao, MAI Ruitao, YE Changxin, YAO Xu, LIU Mingxin   

  1. 1.School of Mathematics and Computer, Guangdong Ocean University, Zhanjiang, Guangdong 524088, China
    2.School of Electronics and Information Engineering, Guangdong Ocean University, Zhanjiang, Guangdong 524088, China
  • Online:2023-05-15 Published:2023-05-15

摘要: 图像语义分割在自动驾驶领域有十分重要的应用,可以让机器人在环境中分割出语义信息,从而对下游的控制动作做出决策。但大部分的深度学习模型都比较大,需庞大的计算资源,很难在移动设备中使用。为了解决这个问题,提出了一种用于语义分割的轻量级神经网络模型,采用编码-解码型与二分支型相结合的网络架构,利用分组卷积、深度可分离卷积、多尺度特征融合模块与通道混洗技术减少网络参数量,提升模型预测精度。该模型训练结合Adam训练法与随机梯度下降法,使用Cityscapes数据集,设置1?000个训练周期。经测试,该模型参数量为3.5×106,在单张显卡Nvidia GTX 1070Ti上的运算速度为每秒103帧图片,达到实时计算标准。在模型评估指标中,平均交并比为61.3%,像素准确率为93.4%,性能均优于SegNet和ENet两种模型。

关键词: 自动驾驶, 深度学习, 语义分割, 轻量级神经网络, 深度可分离卷积

Abstract: Image semantic segmentation has very important applications in autonomous driving, allowing robots to segment semantic information in the environment to make decisions about downstream control actions. However, most of the deep learning models for this task are relatively large, require huge computing resources, and are difficult to use in mobile devices. In order to solve this problem, a lightweight neural network model for semantic segmentation is proposed, which uses a network architecture combining encoding-decoding and two-branch type. Grouping convolution, deep separable convolution, multi-scale feature fusion module and channel shuffling technology are used to reduce the number of network parameters and improve the prediction accuracy of the model. The model training in this paper combines Adam training method and stochastic gradient descent method. The Cityscapes data set is used, and 1000 training cycles are set. After testing, the number of model parameters is 3.5×106, and the calculation speed on a single graphics card GTX 1070Ti is 103 frames per second, which meets the real-time calculation standard. In the model evaluation indicators, the average intersection ratio is 61.3%, and the pixel accuracy rate is 93.4%, both of which are better than SegNet and ENet models.

Key words: autonomous driving, deep learning, semantic segmentation, lightweight neural network, deep separable convolution