计算机工程与应用 ›› 2024, Vol. 60 ›› Issue (19): 110-119.DOI: 10.3778/j.issn.1002-8331.2308-0446

• 模式识别与人工智能 • 上一篇    下一篇

基于知识蒸馏的道路交通标志识别神经网络

葛怡源,于明鑫   

  1. 北京信息科技大学 仪器科学与技术系,北京  100192
  • 出版日期:2024-10-01 发布日期:2024-09-30

Lightweight Road Traffic Sign Identification Neural Network Based on Knowledge Distillation

GE Yiyuan, YU Mingxin   

  1. Department of of Instrument Science and Technology, Beijing Information Science & Technology University, Beijing 100192, China
  • Online:2024-10-01 Published:2024-09-30

摘要: 自然场景下的交通标志识别易受到光照、遮挡和模糊等因素的干扰,从而影响检测精度;同时现有的深度学习模型参数量多、计算复杂度较高导致模型推理时间较长。提出了一种基于知识蒸馏的神经网络架构AFE-ViT(adaptive feature extraction-vision Transformer)用于道路交通标志识别,该架构由自适应特征提取模块和轻量级ViT(vision Transformer)分类器组成,其融合了图像中局部和全局特征信息,对自然场景下的道路交通标志识别具有更好的适应性。其中,自适应特征提取模块结合了InceptionNetV1、SKNet思想和残差结构,实现了多感受野的自适应选择,并作为ViT的前置模块,有效提高了特征提取效率。选择ResNet18作为教师网络,AFE-ViT作为学生网络,采用特征级和输出级知识蒸馏方法对AFE-ViT进行蒸馏,压缩模型参数。实验结果表明,该方法的识别准确率可达98.98%,模型参数量仅为9.9×105,表现优于同类深度学习模型。

关键词: 交通标识, 知识蒸馏, 自适应特征提取

Abstract: Recognition of traffic signs in natural scenes is susceptible to interference from factors such as lighting, occlusions, and blurriness, which can affect detection accuracy. Additionally, existing deep learning models have a large number of parameters and high computational complexity, resulting in longer model inference times. The article proposes a neural network architecture adaptive feature extraction-vision Transformer (AFE-ViT) based on knowledge distillation for road traffic sign recognition. The architecture consists of an adaptive feature extraction module and a lightweight vision Transformer (ViT) classifier. It combines local and global feature information in the image, and has better adaptability to road traffic sign recognition in natural scenes. Among them, the adaptive feature extraction module combines InceptionNetV1, SKNet ideas and residual structure to realize the adaptive selection of multiple receptive fields, and as the front module of ViT, it effectively improves the efficiency of feature extraction. It chooses ResNet18 as the teacher network and AFE-ViT as the student network, and uses feature-level and output-level knowledge distillation methods to distill AFE-ViT and compress model parameters. The experimental results show that the recognition accuracy of this method can reach 98.98%, and the number of model parameters is only 9.9×105, which is better than similar deep learning models.

Key words: traffic sign identification, knowledge distillation, adaptive feature extraction