Small Object Detection Algorithm Based on ATO-YOLO

doi:10.3778/j.issn.1002-8331.2308-0385

Abstract

Abstract: Small object detection is of great significance in the field of computer vision. However, existing methods often suffer from issues such as missed detection and false alarms when dealing with challenges like scale variation, dense object arrangement, and irregular layouts. To address these problems, ATO-YOLO, an improved version of the YOLOv5 algorithm is proposed. Firstly, this paper introduces an adaptive feature extraction (AFE) module that incorporates an attention mechanism to enhance the feature representation capability of the detection model. By dynamically adjusting the weight allocation to highlight key object features, AFE improves the accuracy and robustness of object detection tasks in various scenarios. Secondly, a triple feature fusion (TFF) mechanism is designed to effectively utilize multi-scale information by fusing feature maps from different scales, resulting in more comprehensive object features and enhanced detection performance for small objects. Lastly, an output reconstruction (ORS) module is introduced, which removes the large object detection layer and adds a small object detection layer, enabling precise localization and recognition of small objects. This module also reduces model complexity and improves detection speed compared to the original model. Experimental results demonstrate that the ATO-YOLO algorithm achieves an mAP@0.5 of 38.2% on the VisDrone dataset, a 6.1?percentage points improvement over YOLOv5, with a relative FPS increase of 4.4%. This algorithm enables fast and accurate detection of small objects.

Key words: YOLOv5, multiscale feature fusion, adaptive feature extraction, small object detection

摘要： 小目标检测在计算机视觉领域具有重要意义，但现有方法在应对小目标的尺度变化、目标密集和无规则排列等挑战时经常出现漏检和误检的问题。为解决这些问题，提出基于改进YOLOv5算法的ATO-YOLO。为提升检测模型的特征表达能力，提出一种结合注意力机制的自适应特征提取模块（adaptive feature extraction，AFE），通过动态调整权重分配突出关键目标的特征表示，提高目标检测任务在不同场景下的准确性和鲁棒性。设计一种三重特征融合机制（triple feature fusion，TFF），能够在不同尺度下充分利用多尺度信息，将多个尺度的特征图融合，以获取更全面的目标特征，提升对小目标的检测效果。引入一种输出重构模块（output reconstruction，ORS），通过去除大目标检测层并增加小目标检测层，实现精确定位和识别小目标，并且相对于原模型复杂度更低，检测速度更快。实验结果表明，ATO-YOLO算法在VisDrone数据集上的mAP@0.5达到了38.2%，较原YOLOv5提升了6.1个百分点，且FPS较改进前提升了4.4%，能够快速准确地对小目标进行检测。

关键词: YOLOv5, 多尺度特征融合, 自适应特征提取, 小目标检测

SU Jia, QIN Yichang, JIA Ze, WANG Jing. Small Object Detection Algorithm Based on ATO-YOLO[J]. Computer Engineering and Applications, 2024, 60(6): 68-77.

苏佳, 秦一畅, 贾泽, 王静. 基于ATO-YOLO的小目标检测算法[J]. 计算机工程与应用, 2024, 60(6): 68-77.

References

[1] 胡皓, 郭放, 刘钊. 改进YOLOX-S模型的施工场景目标检测[J]. 计算机科学与探索, 2023, 17(5): 1089-1101.
HU H, GUO F, LIU Z. Object detection based on improved YOLOX-S model in construction sites[J]. Journal of Frontiers of Computer Science and Technology, 2023, 17(5): 1089-1101.
[2] 苏俊楷, 段先华, 叶赵兵. 改进YOLOv5算法的玉米病害检测研究[J]. 计算机科学与探索, 2023, 17(4): 933-941.
SU J K, DUAN X H, YE Z B. Research on corn disease detection based on improved YOLOv5 algorithm[J]. Journal of Frontiers of Computer Science and Technology, 2023, 17(4): 933-941.
[3] 赵振兵, 王帆帆, 刘良帅, 等.基于注意力特征融合YOLOv5模型的无人机输电线路航拍图像金具检测方法[J].电测与仪表, 2023, 60(3):145-152.
ZHAO Z B, WANG F F, LIU L S, et al. Transmission line image fitting detection method based on attention feature fusion YOLOv5 model[J].Electrical Measurement & Instrumentation, 2023, 60(3):145-152.
[4] SUN Y M, CAO B, ZHU P, et al. Drone-based RGB-infrared cross-modality vehicle detection via uncertainty-aware learning[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32(10): 6700-6713.
[5] SUN W, DAI L, ZHANG X, et al. RSOD: real-time small object detection algorithm in UAV-based traffic monitoring[J]. Applied Intelligence, 2022, 52: 8448-8463.
[6] IVERSEN N, SCHOFIELD O B, COUSIN L, et al. Design, integration and implementation of an intelligent and self-recharging drone system for autonomous power line inspection[C]//Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic, 2021: 4168-4175.
[7] 李坤亚, 欧鸥, 刘广滨, 等. 改进YOLOv5的遥感图像目标检测算法[J]. 计算机工程与应用, 2023, 59(9): 207-214.
LI K Y, OU O, LIU G B, et al. Target detection algorithm of remote sensing image based on improved YOLOv5[J]. Computer Engineering and Applications, 2023, 59(9): 207-214.
[8] LIU Y, SUN P, WERGELE N, et al. A survey and performance evaluation of deep learning methods for small object detection[J]. Expert Systems with Applications, 2021, 172: 114602.
[9] TIAN T, PAN Z, TAN X, et al. Arbitrary-oriented inshore ship detection based on multi-scale feature fusion and contextual pooling on rotation region proposals[J].Remote Sensing, 2020, 12(2): 339.
[10] GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 2014: 580-587.
[11] GIRSHICK R. Fast R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 2015:1440-1448.
[12] REN S, HE K, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6):1137-1149.
[13] CAI Z W, VASCONCELOS N. Cascade R-CNN: delving into high quality object detection[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018: 6154-6162.
[14] LIU Y, YANG F, HU P. Small-object detection in UAV-captured images via multi-branch parallel feature pyramid networks[J].IEEE Access, 2020, 8: 145740-145750.
[15] 罗柏槐, 李扬, 林熙烨, 等.融合LoG特征的凸焊螺母检测算法[J].计算机工程与应用:1-12[2023-09-28].http://kns.cnki.net/kcms/detail/11.2127.TP.20230412.1317.010.html.
LUO B H, LI Y, LIN X Y, et al. Weld nut detection algorithm based on LoG features fusion[J/OL].Computer Engineering and Applications:1-12[2023-09-28].http://kns.cnki.net/kcms/detail/11.2127.TP.20230412.1317.010.html.
[16] LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot MultiBox detector[C]//Proceedings of the 14th European Conference on Computer Vision (ECCV 2016), Amsterdam, The Netherlands, October 11-14, 2016: 21-37.
[17] REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 2016: 779-788.
[18] REDMON J, FARHADI A. YOLO9000: better, faster, stronger[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 2017: 6517-6525.
[19] REDMON J, FARHADI A. YOLOv3: an incremental improvement[J]. arXiv:1804.02767, 2018.
[20] BOCHKOVSKIY A, WANG C Y, LIAO H Y. YOLOv4: optimal speed and accuracy of object detection[J]. arXiv:2004.10934, 2020.
[21] CAO J, ZHUANG Y, WANG M, et al. Pedestrian detection algorithm based on ViBe and YOLO[C]//International Conference on Video and Image Processing, 2021:92-97.
[22] SAHIN O, OZER S.YOLODrone: improved YOLO architecture for object detection in drone images[C]//Proceedings of the 44th International Conference on Telecommunications and Signal Processing (TSP), Brno, Czech Republic, 2021: 361-365.
[23] CHEN Y, ZHENG W, ZHAO Y, et al. DW-YOLO: an efficient object detector for drones and self-driving vehicles[J]. Arabian Journal for Science and Engineering, 2022, 48: 1427-1436.
[24] BETTI A, TUCCI M. YOLO-S: a lightweight and accurate YOLO-like network for small target selection in aerial imagery[J]. Sensors, 2023, 23(4):1865.
[25] 张华卫, 张文飞, 蒋占军, 等.引入上下文信息和Attention Gate的GUS-YOLO遥感目标检测算法[J].计算机科学与探索, 2024, 18（2）: 453-464.
ZHANG H W, ZHANG W F, JIANG Z J, et al.GUS-YOLO remote sensing target detection algorithm introducing context information and Attention Gate[J]. Journal of Frontiers of Computer Science and Technology, 2024, 18（2）: 453-464.
[26] 谢椿辉, 吴金明, 徐怀宇.改进YOLOv5的无人机影像小目标检测算法[J].计算机工程与应用, 2023, 59(9):198-206.
XIE C H, WU J M, XU H Y. Small object detection algorithm based on improved YOLOv5 in UAV image[J]. Computer Engineering and Applications, 2023, 59(9): 198-206.
[27] HU J, SHEN L, ALBANIE S, et al. Squeeze-and-excitation networks[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018:7132-7141.
[28] ZHU L, LEE F, CAI J, et al. An improved feature pyramid network for object detection[J]. Neurocomputing, 2022, 483: 127-139.
[29] LIU S, QI L, QIN H, et al. Path aggregation network for instance segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 8759-8768.
[30] ZHENG C, ZHENG J, LI J. Real-time conveyor belt deviation detection algorithm based on multi-scale feature fusion network[J]. Algorithms, 2019, 12(10): 205.
[31] 胡昭华, 王莹.改进YOLOv5的交通标志检测算法[J].计算机工程与应用, 2023, 59(1):82-91.
HU Z H, WANG Y. Improved traffic sign detection algorithm for YOLOv5[J]. Computer Engineering and Applications, 2023, 59(1): 82-91.
[32] DU D, ZHU P, WEN L, et al. VisDrone-DET2019: the vision meets drone object detection in image challenge results[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), Seoul, Korea (South), 2019: 213-226.
[33] LI C, LI L, JIANG H, et al. YOLOv6: a single-stage object detection framework for industrial applications[J]. arXiv:2209.02976, 2022.
[34] GE Z, LIU S, WANG F, et al. YOLOX: exceeding YOLO series in 2021[J]. arXiv:2107.08430, 2021.
[35] WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[C]//Proceedings of the 15th European Conference on Computer Vision, Munich, Germany, 2018: 3-19.
[36] WANG Q, WU B, ZHU P, et al. ECA-Net: efficient channel attention for deep convolutional neural networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 2020: 11531-11539.
[37] HOU Q, ZHOU D, FENG J. Coordinate attention for efficient mobile network design[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, 2020: 13708-13717.