Enhancing YOLOv8 for Improved Instance Segmentation of Automotive Surface Damage

doi:10.3778/j.issn.1002-8331.2403-0087

Abstract

Abstract: To address the shortcomings of manual damage assessment and issues with conventional vehicle damage detection models in the context of intelligent vehicles, it proposes EIS-YOLO, an enhanced instance segmentation model based on YOLOv8. It introduces CRDB, a novel multi-scale feature fusion and channel reduction module that replaces C2f, reducing parameters by 20.15% while improving fusion efficiency. Additionally, HRFPN structure maintains high-resolution branches, facilitates finer detail and semantic exchange, and includes AFF and BiAM attention modules for deeper feature integration. An efficient E-FPN and an extra output head are utilized to better identify small damages and edges. Evaluated on CarDD dataset, CRDB improves multi-task accuracy by 2?percentage points, and the integrated EIS-YOLO model with HRFPN sees a 4.4?percentage points boost in [PB] and 6.6?percentage points in [PM] over the baseline, all while maintaining a lighter weight and lower computational complexity.

Key words: vehicle damage detection, YOLO-Seg, attention mechanism, multi-scale feature fusion, CarDD vehicle damage data

摘要： 针对人工定损方式无法满足智能汽车时代的发展要求，及传统汽车伤损检测模型精度低、信息少、难部署等问题，提出了改进YOLOv8的汽车伤损实例分割模型EIS-YOLO。在主干网络中设计了一个多尺度特征融合与通道数减小的CRDB模块，取代传统C2f模块，显著减少了参数量的同时提高了特征融合的能力；提出了保留高分辨率分支的HRFPN结构，以加强细节信息保留能力，增强细节与语义信息的交换，该结构通过AFF和BiAM注意力融合模块增强了深层传递，经由简化冗余连接的E-FPN完成特征融合。还增加了一个额外的输出头捕捉细小伤损，提高了模型对小目标伤损及伤损边缘的精确识别。在CarDD数据集上，主干网络部分提出的CRDB模块对比C2f模块实现了同架构下计算量减小20.15%，同时多任务平均准确率提升2个百分点，在此基础上，结合HRFPN结构与额外输出头设计的模型整体的准确率[PB]、[PM]相较于基准模型分别提升了4.4和6.6个百分点，且模型更轻量，计算复杂度更低。

关键词: 汽车伤损检测, YOLO-Seg, 注意力机制, 多尺度特征融合, CarDD汽车伤损数据

TAN Xu, ZHAO Ji. Enhancing YOLOv8 for Improved Instance Segmentation of Automotive Surface Damage[J]. Computer Engineering and Applications, 2024, 60(14): 197-208.

谭旭, 赵骥. 改进YOLOv8的汽车表面伤损实例分割模型[J]. 计算机工程与应用, 2024, 60(14): 197-208.

References

[1] 张瀚丹, 吴一全. 基于视觉的汽车装配件缺陷检测研究进展[J]. 仪器仪表学报, 2023, 44(8): 1-20.
ZHANG H D, WU Y Q. Research progress of vehicle assembly defect detection methods based on vision[J]. Chinese Journal of Scientific Instrument, 2023, 44(8): 1-20.
[2] 张震宇, 刘阳, 刘福才. 基于YOLOv3-spp的汽车轮毂表面缺陷检测算法研究与分析[J]. 计量学报, 2023, 44(9): 1375-1382.
ZHANG Z Y, LIU Y, LIU F C. Research and analysis of automobile wheel hub surface defect detection algorithm based on YOLOv3-spp[J]. Acta Metrologica Sinica, 2023, 44(9): 1375-1382.
[3] 吕晖. 基于YOLO算法的汽车玻璃缺陷检测[D]. 福州: 福建工程学院, 2023.
LV H. Automotive glass defect detection based on YOLO algorithm[D]. Fuzhou: Fujian University of Technology, 2023.
[4] 孙繁荣, 肖楠, 吴月新. 基于非局部U-Net模型的汽车零部件缺陷分割算法[J]. 电子设计工程, 2022, 30(16): 70-74.
SUN F R, XIAO N, WU Y X. Defect segmentation algorithm of auto parts based on nonlocal U-NET model[J]. Electronic Design Engineering, 2022, 30(16): 70-74.
[5] WANG X, LE W, WU Z. CarDD: a new dataset for vision-based car damage detection[J]. IEEE Transactions on Intelligent Transportation Systems, 2023, 24（7）：7202-7214.
[6] JIN H, WANG X, WU Z. An anchor free car damage detection method[C]//Proceedings of the 2023 9th International Conference on Computing and Data Engineering, 2023: 66-71.
[7] WANG C Y, BOCHKOVSKIY A, LIAO H Y M. YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023: 7464-7475.
[8] BOLYA D, ZHOU C, XIAO F, et al. YOLACT: real-time instance segmentation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019: 9157-9166.
[9] LI X, WANG W, WU L, et al. Generalized focal loss: learning qualified and distributed bounding boxes for dense object detection[C]//Proceedings of the 34th International Conference on Neural Information Processing Systems, 2020: 21002-21012.
[10] FAN M, LAI S, HUANG J, et al. Rethinking bisenet for real-time semantic segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 9716-9725.
[11] PENG J, LIU Y, TANG S, et al. PP-LiteSeg: a superior real-time semantic segmentation model[J]. arXiv:2204.02681, 2022.
[12] MA N, ZHANG X, ZHENG H T, et al. ShuffleNet V2: practical guidelines for efficient CNN architecture design[C]//Proceedings of the European Conference on Computer Vision, 2018: 116-131.
[13] PAN J, BULAT A, TAN F, et al. EdgeVITs: competing light-weight CNNs on mobile devices with vision transformers[C]//Proceedings of the 17th European Conference on Computer Vision, 2022: 294-311.
[14] MEHTA S, RASTEGARI M. MobileViT: light-weight, general-purpose, and mobile-friendly vision transformer[J]. arXiv:2110.02178, 2021.
[15] YANG C, QIAO S, YU Q, et al. MOAT: alternating mobile convolution and attention brings strong vision models[J]. arXiv:2210.01820, 2022.
[16] ZHANG J, LI X, LI J, et al. Rethinking mobile block for efficient attention-based models[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023: 1389-1400.
[17] LIN T Y, DOLLáR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 2117-2125.
[18] LIU S, QI L, QIN H, et al. Path aggregation network for instance segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 8759-8768.
[19] TAN M, PANG R, LE Q V. EfficientDet: scalable and efficient object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 10781-10790.
[20] WANG C, HE W, NIE Y, et al. Gold-YOLO: efficient object detector via gather-and-distribute mechanism[J]. arXiv:2309.11331, 2023.
[21] XU J, XIONG Z, BHATTACHARYYA S P. PIDNet: a real-time semantic segmentation network inspired by pid controllers[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023: 19529-19539.
[22] HAN K, WANG Y, TIAN Q, et al. GhostNet: more features from cheap operations[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 1580-1589.
[23] DAI Y, GIESEKE F, OEHMCKE S, et al. Attentional feature fusion[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2021: 3560-3569.
[24] LI R, HE C, LI S, et al. DynaMask: dynamic mask selection for instance segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023: 11279-11288.
[25] WANG W, DAI J, CHEN Z, et al. InternImage: exploring large-scale vision foundation models with deformable convolutions[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023: 14408-14419.
[26] HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 7132-7141.
[27] ZHANG Q L, YANG Y B. SA-NET: shuffle attention for deep convolutional neural networks[C]//Proceedings of the 2021 IEEE International Conference on Acoustics, Speech and Signal Processing, 2021: 2235-2239.
[28] HOU Q, ZHOU D, FENG J. Coordinate attention for efficient mobile network design[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 13713-13722.
[29] WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[C]//Proceedings of the European Conference on Computer Vision, 2018: 3-19.
[30] OUYANG D, HE S, ZHANG G, et al. Efficient multiscale attention module with cross-spatial learning[C]//Proceedings of the 2023 IEEE International Conference on Acoustics, Speech and Signal Processing, 2023: 1-5.
[31] WAN Q, HUANG Z, LU J, et al. SeaFormer: squeeze-enhanced axial transformer for mobile semantic segmentation[J]. arXiv:2301.13156, 2023.
[32] BALCI B, ARTAN Y, ALKAN B, et al. Front-view vehicle damage detection using roadway surveillance camera images[C]//Proceedings of the 5th International Conference on Vehicle Technology and Intelligent Transport Systems, 2019: 193-198.
[33] PATIL K, KULKARNI M, SRIRAMAN A, et al. Deep learning based car damage classification[C]//Proceedings of the 2017 16th IEEE International Conference on Machine Learning and Applications, 2017: 50-54.
[34] DWIVEDI M, MALIK H S, OMKAR S N, et al. Deep learning-based car damage classification and detection[C]//Advances in Artificial Intelligence and Data Engineering, 2021: 207-221.
[35] SINGH R, AYYAR M P, PAVAN T V S, et al. Automating car insurance claims using deep learning techniques[C]//Proceedings of the 2019 IEEE 5th International Conference on Multimedia Big Data, 2019: 199-207.
[36] HE K, GKIOXARI G, DOLLáR P, et al. Mask R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision, 2017: 2961-2969.
[37] REDMON J, FARHADI A. YOLOv3: an incremental improvement[J]. arXiv:1804.02767, 2018.
[38] LI C, LI L, JIANG H, et al. YOLOv6: a single-stage object detection framework for industrial applications[J]. arXiv:2209.02976, 2022.
[39] KANG M, TING C M, TING F F, et al. BGF-YOLO: enhanced YOLOv8 with multiscale attentional feature fusion for brain tumor detection[J]. arXiv:2309.12585, 2023.
[40] BAI R, WANG M, ZHANG Z, et al. Automated construction site monitoring based on improved YOLOv8-seg instance segmentation algorithm[J]. IEEE Access, 2023, 11: 139082-139096.
[41] LV W, XU S, ZHAO Y, et al. DETRs beat YOLOs on real-time object detection[J]. arXiv:2304.08069, 2023.