改进YOLOv5的X光图像违禁品检测算法

doi:10.3778/j.issn.1002-8331.2210-0151

摘要/Abstract

摘要： 针对X光图像违禁品检测中的复杂背景、正负类别不平衡和漏检等问题，提出一种基于YOLOv5的X光违禁品检测算法。该算法通过在YOLOv5s骨干网络中引入Swin Transformer模块，利用局部自注意力与Shifted Window机制提升模型对X光图像全局特征的提取能力，并且在主干网络后增加空间注意力机制与通道注意力机制，以提升算法对违禁品关键特征的提取能力。引入一种自适应空间特征融合结构，缓解特征金字塔中不同层级特征图之间冲突对模型梯度的干扰。引入Focal Loss函数用于改进YOLOv5s的背景预测损失函数和分类损失函数，提升算法在正负样本与难易样本失衡情况下的检测能力。该算法在公开数据集SIXray100上的平均检测精度达到57.4%，相比YOLOv5s提高了4.5个百分点；在SIXray正样本数据集上的平均检测精度达到90.4%，相比YOLOv5s提高了2.4个百分点。实验结果表明，改进后的算法相比原始YOLOv5s算法检测精度有较大提升，证明了算法的有效性。

关键词: 深度学习, 目标检测, 违禁品检测, YOLOv5, 注意力机制

Abstract: Aiming at the problems of complex background, missing detection, and imbalance of positive and negative categories in X-ray image contraband detection, an X-ray contraband detection algorithm based on YOLOv5 is proposed. Firstly, the algorithm introduces the Swin Transformer into the YOLOv5s backbone network, and uses its local self-attention and Shifted Window to improve the algorithm’s ability to extract global features of X-ray images, the spatial attention mechanism and channel attention mechanism are added after the backbone network to improve the algorithm’s ability to extract key features of contraband. Secondly, an adaptive spatial feature fusion structure is introduced to alleviate the interference of the conflict between feature maps at different levels in the feature pyramid on the model gradient. Finally, the Focal Loss is introduced to improve the background prediction loss function and classification loss function of YOLOv5s, and improve the detection ability of the algorithm in the case of imbalance between positive and negative samples and difficult and easy samples. The average detection accuracy of the algorithm in the public dataset SIXray100 reaches 57.4%, which is 4.5 percentage points higher than that of YOLOv5s; the average detection accuracy in the SIXray positive sample dataset is 90.4%, which is 2.4 percentage points higher than that of YOLOv5s. The experimental results show that the improved algorithm has a great improvement in detection accuracy compared with the original YOLOv5s algorithm, which proves the effectiveness of the algorithm.

Key words: deep learning, object detection, prohibited items detection, YOLOv5, attention mechanism

李文强, 陈莉, 谢旭, 郝星星, 李豪斌. 改进YOLOv5的X光图像违禁品检测算法[J]. 计算机工程与应用, 2023, 59(16): 170-176.

LI Wenqiang, CHEN Li, XIE Xu, HAO Xingxing, LI Haobin. Algorithm for Detecting Prohibited Items in X-Ray Images Based on Improved YOLOv5[J]. Computer Engineering and Applications, 2023, 59(16): 170-176.

参考文献

[1] 张玉涛.基于卷积神经网络的多尺度安检违禁品检测[D].天津：中国民航大学，2020.
ZHANG Y T.Multi-scale prohibited items detection in X-ray security inspection based on CNN[D].Tianjin：Civil Aviation University of China，2020.
[2] AKCAY S，TOBY P.An evaluation of region-based object detection strategies within X-ray baggage security imagery[C]//2017 IEEE International Conference on Image Processing，Beijing，2017：1337-1341.
[3] AKCAY S，BRECKON T.Towards automatic threat detection：a survey of advances of deep learning within X-ray security imaging[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition，Long Beach，2019：2114-2123.
[4] REN S，HE K，GIRSHICK R，et al.Faster R-CNN：towards real-time object detection with region proposal networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，2017，39（6）：1137-1149.
[5] LIU W，ANGUELOV D，ERHAN D，et al.SSD：single shot multibox detector[C]//14th European Conference on Computer Vision.Cham：Springer，2016：21-37.
[6] REDMON J，FARHADI A.YOLO9000：better，faster，stronger[C]//2017 Conference on Computer Vision and Pattern Recognition，Hawaii，2017：6517-6525.
[7] REDMON J，FARHADI A.YOLOv3：an incremental improvement[J].arXiv：1804.02767，2018.
[8] BOCHKOVSKIY A，WANG C Y，LIAO H.YOLOv4：optimal speed and accuracy of object detection[C]//2020 IEEE Conference on Computer Vision and Pattern Recognition，2020.
[9] JOCHER G，CHAURASIA A.YOLOv5[EB/OL].[2022-10-08].https：//github.com/ultralytics/yolov5.
[10] LIN T Y，GOYAL P，GIRSHICK R，et al.Focal Loss for dense object detection[C]//2017 IEEE International Conference on Computer Vision，Venice，2017：2999-3007.
[11] 康佳楠，张良.多通道区域建议的多尺度X光安检图像检测[J].计算机工程与应用，2022，58（1）：224-231.
KANG J N，ZHANG L.Multi-scale X-ray security inspection image detection with multi-channel region proposal[J].Computer Engineering and Applications，2022，58（1）：224-231.
[12] 郭守向，张良.YOLO-C：基于单阶段网络的X光图像违禁品检测[J].激光与光电子学进展，2021，58（8）：0810003.
GUO S X，ZHANG L.YOLO-C：one-stage network for prohibited items detection within X-ray images[J].Laser & Optoelectronics Progress，2021，58（8）：0810003.
[13] ZHANG Y，ZHANG H，ZHAO T，et al.Automatic detection of prohibited items with small size in X-ray images[J].Optoelectronics Letters，2020，16（4）：313-317.
[14] MIAO C J，XIE L X，WAN F，et al.SIXray：a large-scale security inspection X-ray benchmark for prohibited item discovery in overlapping images[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition，Long Beach，2019：2114-2123.
[15] LI J，LIU Y，CUI Z.Segmentation and attention network for complicated X-ray images[C]//2020 35th Youth Academic Annual Conference of Chinese Association of Automation，2020：727-731.
[16] LIU Z，LIN Y，CAO Y，et al.Swin transformer：hierarchical vision transformer using shifted windows[C]//2021 IEEE/CVF International Conference on Computer Vision，2021：10012-10022.
[17] WOO S，PARK J，LEE J Y，et al.CBAM：convolutional block attention module[C]//15th European Conference on Computer Vision.Cham：Springer，2018：3-19.
[18] HU J，L SHE，G SUN.Squeeze-and-excitation networks[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition，Salt Lake City，2018：7132-7141.
[19] BODLA N，SINGH B，CHELLAPPA R，et al.Soft-NMS：improving object detection with one line of code[C]//2017 IEEE International Conference on Computer Vision，Venice，Oct 22-29，2017：5562-5570.
[20] LIU S，HUANG D，WANG Y.Learning spatial fusion for single-shot object detection[J].arXiv：1911.09516，2019.
[21] ZHANG S，CHI C，YAO Y，et al.Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition，2020：9759-9768.
[22] ZHU B，WANG J，JIANG Z，et al.AutoAssign：differentiable label assignment for dense object detection[J].arXiv：2007.03496，2020.
[23] SELVARAJU P R，COGSWELL M，DAS A，et al.Grad-CAM：visual explanations from deep networks via gradient-based localization[C]//2017 IEEE International Conference on Computer Vision，2017：618-626.