改进YOLOv7的复杂环境下铅封小目标检测

doi:10.3778/j.issn.1002-8331.2304-0385

摘要/Abstract

摘要： 针对海港集装箱运输场景复杂、受光强弱程度不同、视角远近不同、铅封与背景颜色相近等情况导致的小目标铅封检测困难问题，提出了一种改进的YOLOv7集装箱上铅封检测方法。采用一种将上下文信息直接融入目标检测任务的方法，结合自顶向下的特征金字塔网络（path aggregation feature pyramid network，PAFPN）结构进行不同尺度的特征信息融合，提高辨别准确度；针对小铅封特征在训练过程中出现消失的问题，为骨干网络的最后一个MPConv与E-EALN模块嵌入可变形卷积模块（deformable convolution v3），适应形状大小不同输入的铅封特征图，在特征融合时，保证更多包含浅层语义信息的特征图被送入分类网络，增加模型复杂场景下的学习能力；在Neck部分融入自注意力机制（SimAM），自适应地选择输入中的重要信息，进一步提高在复杂多变背景下模型表现能力；针对数据集中集装箱上铅封距离远近不同，采用Focal Loss分类损失函数替换交叉熵损失，平衡高质量样本和低质量样本对Loss贡献，采用引入超参数的EIoU、CIoU Loss定位损失改进CIoU损失，使模型更关注预测框与真实框的重叠度，提高损失计算的准确性，同时适用于目标形状大小的变化性，提高鲁棒性。结果显示，改进后的YOLOv7算法相较于原始算法，可以达到81.6%的平均精度（mAP），检测效果优于其他经典目标检测网络和原始网络，在时间性能上，平均每张图像的识别时间为0.058?s，符合集装箱港口铅封检测的实时性要求。

关键词: 铅封, 小目标检测, YOLOv7, 上下文信息, 可变形卷积, 注意力机制, 损失函数

Abstract: An improved YOLOv7 method for detecting container seal on top of the container is proposed to address the difficulties in detecting small target seals due to the complexity of the harbor container transportation scenario, varying degrees of light intensity, different perspectives, and similar colors between the seal and the background. Firstly, a method is used to directly integrate contextual information into the object detection task, combined with the PAFPN structure for feature fusion at different scales to improve discrimination accuracy. Secondly, to address the issue of small seal features disappearing during training, the last MPConv and E-EALN modules of the backbone network are embedded with deformable convolution v3, which can adapt to seal feature maps of different shapes and sizes. During feature fusion, more feature maps containing shallow semantic information are sent to the classification network to increase the model’s learning ability in complex scenes. At the same time, a self-attention mechanism（SimAM） is incorporated into the Neck section to adaptively select important information from the input and further improve the model’s performance in complex and changing backgrounds. Finally, in view of the different lead seal distances on containers in the dataset, Focal Loss classification loss function is used to replace the cross entropy loss, balance the contribution of high-quality samples and low-quality samples to Loss, and introduce the EIoU and CIoU Loss positioning loss of hyperparameter to improve the CIoU Loss, so that the model pays more attention to the overlap between the prediction box and the real box, improves the accuracy of loss calculation, and applies to the variability of target shape and size, which improves the accuracy of loss calculation and model robustness. Experimental results show that the improved YOLOv7 algorithm achieves an average precision（mAP） of 81.6%, which is superior to other classical object detection networks and the original network. In terms of time performance, the average recognition time per image is 0.058 s, meeting the real-time requirements for container port seal detection.

Key words: lead sealing, small object detection, YOLOv7, context information, deformable convolution, attention mechanism, loss function

张海镔, 裴斐, 雷帮军, 夏平. 改进YOLOv7的复杂环境下铅封小目标检测[J]. 计算机工程与应用, 2023, 59(19): 130-139.

ZHANG Haibin, PEI Fei, LEI Bangjun, XIA Ping. Improved YOLOv7 for Lead-Sealed Small Target Detection in Complex Environments[J]. Computer Engineering and Applications, 2023, 59(19): 130-139.

参考文献

[1] GIRSHICK R，DONAHUE J，DARRELL T，et al.Rich feature hierarchies for accurate object detection and semantic segmentation[J].arXiv：1311.2524，2013.
[2] REN S Q，HE K M，GIRSHICK R，et al.Faster R-CNN：towards real-time object detection with region proposal networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，2017，39（6）：1137-1149.
[3] RASTEGARI M，ORDONEZ V，REDMON J，et al.XNOR-Net：imagenet classification using binary convolutional neural networks[C]//Proceedings of the European Conference on Computer Vision，2016：525-542.
[4] HE K，GKIOXARI G，DOLLáR P，et al.Mask R-CNN[J].IEEE Transactions on Pattern Analysis & Machine Intelligence，2018，42（2）：386-397.
[5] REDMON J，DIVVALA S，GIRSHICK R，et al.You only look once：unified，real-time object detection[J].arXiv：1506.02640，2015.
[6] REDMON J，FARHADI A.YOLO9000：better，faster，stronger[C]//Proceedings of the IEEE Conference on Computer Vision & Pattern Recognition，2017：6517-6525.
[7] REDMON J，FARHADI A.YOLOv3：an Incremental Improvement[J].arXiv：1804.02767，2018.
[8] BOCHKOVSKIY A，WANG C Y，LIAO H Y M.YOLOv4：optimal speed and accuracy of object detection[J].arXiv：2004.10934，2020.
[9] WEI L，DRAGOMIR A，DUMITRU E，et al.SSD：single shot MultiBox detector[C]//Proceedings of the European Conference on Computer Vision，2016：21-37.
[10] HAN J，DING J，XUE N，et al.ReDet：a rotation-equivariant detector for aerial object detection[J].arXiv：2103.07733，2021.
[11] ZAND M，ETEMAD A，GREENSPAN M.Oriented bounding boxes for small and freely rotated objects[J].IEEE Transactions on Geoscience and Remote Sensing，2021，60：1-15.
[12] YU D，XU Q，GUO H，et al.Anchor-free arbitrary-oriented object detector using box boundary-aware vectors[J].IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing，2022，15：2535-2545.
[13] 杨杰敏，郭保琪，罗汉江，等.基于深度卷积网络的港口集装箱属性识别方法[J].中国海洋大学学报（自然科学版），2019，49（12）：134-140.
YANG J M GUO B Q，LUO H J，et al.Port container attribute recognition method based on deep convolurtion network[J].Periodical of Ocean University of China（Science Edition），2019，49（12）：134-140.
[14] LIANG C，XIONG J，ZHENG Z，et al.A visual detection method for nighttime litchi fruits and fruiting stems[J].Computers and Electronics in Agriculture，2020，169：105192.
[15] 熊俊涛，郑镇辉，梁嘉恩，等.基于改进YOLO v3网络的夜间环境柑橘识别方法[J].农业机械学报，2020，51（4）：199-206.
XIONG J T，ZHENG Z H，LIANG J N，et al.Citrus detection method in night environment based on improved YOLOv3 network[J].Transactions of the Chinese Society for Agricultural Machinery，2020，51（4）：199-206.
[16] WANG C Y，BOCHKOVSKIY A，LIAO H Y M.YOLOv7：trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[J].arXiv：2207.02696，2022.
[17] NIU Z Y，ZHONG G Q，YU H.A review on the attention mechanism of deep learning[J].Neurocomputing，2021，452：48-62.
[18] ZHOU D，FANG J，SONG X，et al.IOU loss for 2D/3D object detection[C]//Proceedings of the 2019 International Conference on 3D Vision（3DV），2019：85-94.
[19] ZHENG Z，WANG P，LIU W，et al.Distance-IoU loss：faster and better learning for bounding box regression[J].arXiv：1911.08287，2019.
[20] ZHANG Y F，REN W，ZHANG Z，et al.Focal and efficient iou loss for accurate bounding box regression[J].arXiv：2101.08158，2021.
[21] ZHENG Z H，WANG P，REN D W，et al.Enhancing geometric factors in model learning and inference for object detection and instance segmentation[J].IEEE Transactions on Cybernetics，2022，52：8574-8586.
[22] ZHAO H S，SHI J P，QI X J，et al.Pyramid scene parsing network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition（CVPR），2017：6230-6239.
[23] ZHANG F，JIAO L，LI L，et al.MultiResolution attention extractor for small object detection[J].arXiv：2006. 05941，2020.
[24] ZHU X Z，HU H，LIN S，et al.Deformable convnets v2：more deformable，better results[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition，2019：9308-9316.
[25] WANG W，DAI J，CHEN Z，et al.Internimage：exploring large-scale vision foundation models with deformable convolutions[J].arXiv：2211.05778，2022.
[26] YANG L，ZHANG R Y，LI L，et al.SimAM：a simple，parameter-free attention module for convolutional neural networks[C]//Proceedings of the International Conference on Machine Learning，2021：11863.