Lightweight Object Detection Method for Constrained Environments

doi:10.3778/j.issn.1002-8331.2211-0283

Abstract

Abstract: The lightweight design of object detection models plays an important role in environments with limited computing resources and storage space. To further compress the size of the object detection model and improve its detection accuracy, a higher performance lightweight object detection model named Lite-YOLOX is proposed, which improves the structure of the feature pyramid, the structure of the decoupling head, and the loss function based on the YOLOX-Tiny model. Firstly, to further compress the size of the original model, the structure of the feature pyramid and decoupled head are redesigned to make the neck and head parts of the model lighter. Then, to improve the detection accuracy of the model, the EIoU loss function which is more sensitive to the position of the ground truth box is designed to optimize the proposed model. Finally, the validation experiments are performed on the Pascal VOC and safety helmet wearing dataset. The experimental results show that compared with YOLOX-Tiny, Lite-YOLOX reduces the parameters by 40%, the floating point of operations by 37.5%, and the mAP50 increases by 3.2 and 3.1 percentage points. On the NVIDIA Jetson Xavier NX, the frames per second (FPS) is increased from 51 to 59, and the real-time performance is significantly improved.

Key words: object detection, lightweight, feature fusion, loss function

摘要： 为了进一步降低目标检测模型YOLOX-Tiny的大小并提高检测精度，以便于更好地适用于计算资源和存储空间有限的环境，在特征金字塔的结构、解耦头的结构和损失函数上对其进行改进，形成一种更高性能的轻量级目标检测模型Lite-YOLOX。为进一步压缩原有模型体积，重新设计特征金字塔和解耦头的结构，使模型的Neck和Head部分更轻量化；为提升模型的检测精度，在原有IoU损失函数的基础上进行优化，设计并提出EIoU损失函数，改进后的损失函数对真实框和预测框的位置更加敏感；选取PASCAL VOC和安全帽检测数据集对改进模型进行验证。实验结果表明：Lite-YOLOX与YOLOX-Tiny相比，参数量减少40%，计算量下降37.5%，mAP50提升3.2和3.1个百分点。在NVIDIA Jetson Xavier NX上，每秒传输帧数（FPS）从51增加到59，实时性有了明显的提升。

关键词: 目标检测, 轻量化, 特征融合, 损失函数

QU Haicheng, YUAN Xudong, LI Jiaqi. Lightweight Object Detection Method for Constrained Environments[J]. Computer Engineering and Applications, 2024, 60(6): 274-281.

曲海成, 袁旭东, 李佳琦. 适用于约束环境的轻量级目标检测模型[J]. 计算机工程与应用, 2024, 60(6): 274-281.

References

[1] LIU Z, LIN Y, CAO Y, et al. Swin transformer: hierarchical vision transformer using shifted windows[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021: 10012-10022.
[2] TAN M, PANG R, LE Q V. Efficientdet: scalable and efficient object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 10781-10790.
[3] CARION N, MASSA F, SYNNAEVE G, et al. End-to-end object detection with transformers[C]//European Conference on Computer Vision. Cham: Springer, 2020: 213-229.
[4] REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 779-788.
[5] BOCHKOVSKIY A, WANG C Y, LIAO H Y M. YOLOv4: optimal speed and accuracy of object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Washington State, 2020: 1-10.
[6] WANG C Y, LIAO H Y M, WU Y H, et al. CSPNet: a new backbone that can enhance learning capability of CNN[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020: 390-391.
[7] GLENN J. Yolov5[EB/OL].(2021-01) https://github.com/ultralytics/yolov5.
[8] 马金林, 张裕, 马自萍, 等. 轻量化神经网络卷积设计研究进展[J]. 计算机科学与探索, 2022, 16(3): 512-528.
MA J L, ZHANG Y, MA Z P, et al. Research progress of lightweight neural network convolution design[J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(3): 512-528.
[9] GE Z, LIU S, WANG F, et al. YOLOX: exceeding YOLO series in 2021[J]. arXiv:2107.08430, 2021.
[10] YU J, JIANG Y, WANG Z, et al. Unitbox: an advanced object detection network[C]//Proceedings of the 24th ACM International Conference on Multimedia, 2016: 516-520.
[11] SONG G, LIU Y, WANG X. Revisiting the sibling head in object detector[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 11563-11572.
[12] QIAO L, ZHAO Y, LI Z, et al. DeFRCN: decoupled faster R-CNN for few-shot object detection[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021: 8681-8690.
[13] LI Y, SHEN Z, LI J, et al. A deep learning method based on SRN-YOLO for forest fire detection[C]//2022 5th International Symposium on Autonomous Systems (ISAS), 2022: 1-6.
[14] WU Y, CHEN Y, YUAN L, et al. Rethinking classification and localization for object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 10186-10195.
[15] LIN T Y, DOLLAR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 2117-2125.
[16] LIU S, QI L, QIN H, et al. Path aggregation network for instance segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 8759-8768.
[17] REDMON J, FARHADI A. YOLO9000: better, faster, stronger[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 7263-7271.
[18] REN S, HE K, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[C]//Proceedings of the Annual Conference on Neural Information Processing Systems, Montreal, Dec 7-12, 2015. Red Hook: Curran Associates, 2015: 91-99.
[19] TIAN Z, SHEN C, CHEN H, et al. FCOS: fully convolutional one-stage object detection[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019: 9627-9636.
[20] DUAN K, BAI S, XIE L, et al. Centernet: keypoint triplets for object detection[C]//Proceedings of the IEEE /CVF International Conference on Computer Vision, 2019: 6569-6578.
[21] ZHU C, HE Y, SAVVIDES M. Feature selective anchor-free module for single-shot object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 840-849.
[22] GE Z, LIU S, LI Z, et al. Ota: optimal transport assignment for object detection[C]//Proceedings of the IEEE /CVF Conference on Computer Vision and Pattern Recognition, 2021: 303-312.
[23] LI H, KADAV A, DURDANOVIC I, et al. Pruning filters for efficient ConvNets[C]//5th International Conference on Learning Representations (ICLR), Toulon, 2017: 1-13.
[24] RONNEBERGER O, FISCHER P, BROX T. U-net: convolutional networks for biomedical image segmentation[C]//International Conference on Medical Image Computing and Computer-Assisted Intervention. Cham: Springer, 2015: 234-241.
[25] HUANG S, LU Z, CHENG R, et al. FaPN: feature-aligned pyramid network for dense image prediction[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021: 864-873.
[26] HOWARD A, SANDLER M, CHU G, et al. Searching for mobilenetv3[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019: 1314-1324.
[27] HE J, ERGANI S, MA X, et al. Alpha-IoU: a family of power intersection over union losses for bounding box regression[C]//Advances in Neural Information Processing Systems, 2021: 20230-20242.
[28] XU X, LIANG W, ZHAO J, et al. Tiny FCOS: a lightweight anchor-free object detection algorithm for mobile scenarios[J]. Mobile Networks and Applications, 2021, 26(6): 2219-2229.
[29] ZHANG H, ZHANG J, ZHANG Q, et al. RsaNet: recurrent slice-wise attention network for multiple sclerosis lesion segmentation[C]//International Conference on Medical Image Computing and Computer-Assisted Intervention. Cham: Springer, 2019: 411-419.
[30] REDMON J, FARHADI A. YOLOv3: an incremental improvement[J]. arXiv:1804.02767, 2018.
[31] WONG A, FAMUORI M, SHAFIEE M J, et al. Yolo nano: a highly compact you only look once convolutional neural network for object detection[C]//2019 Fifth Workshop on Energy Efficient Machine Learning and Cognitive Computing-NeurIPS Edition (EMC2-NIPS), 2019: 22-25.
[32] WANG G, DING H, LI B, et al. Trident‐YOLO: improving the precision and speed of mobile device object detection[J]. IET Image Processing, 2022, 16(1): 145-157.
[33] IANDOLA F N, HAN S, MOSKEWICZ M W, et al. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5 MB model size[C]//5th International Conference on Learning Representations (ICLR), Toulon, 2017: 1-13.
[34] LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multibox detector[C]//European Conference on Computer Vision. Cham: Springer, 2016: 21-37.
[35] HAN K, WANG Y, TIAN Q, et al. GhostNET: more features from cheap operations[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 1580-1589.