Review on Research and Application of Deep Learning-Based Target Detection Algorithms

doi:10.3778/j.issn.1002-8331.2305-0310

Abstract

Abstract: With the continuous development of deep learning, deep convolutional neural networks are increasingly used in the field of target detection and are now applied in many fields such as agriculture, transportation, and medicine. Compared with traditional feature-based manual methods, deep learning-based target detection methods can learn both low-level and high-level image features with better detection accuracy and generalization ability. To outline and summarize the latest advances and technologies in the field of target detection, the status of deep learning-based target detection algorithms and applications is reviewed by analyzing the deep learning-based target detection technologies in recent years. Firstly, the development, advantages and disadvantages of two kinds of target detection network architectures, two phases and single phase, are summarized; secondly, the backbone network, data set and evaluation metrics are described, the detection accuracy of classical algorithms are compared, and the improvement strategies of classical target detection algorithms are summarized; finally, the current stage of target detection applications are discussed, and future research priorities in the field of target detection are proposed.

Key words: target detection, deep learning, computer vision, deep convolutional neural network

摘要： 随着深度学习的不断发展，深度卷积神经网络在目标检测领域中的应用愈加广泛，现已被应用于农业、交通和医学等众多领域。与基于特征的传统手工方法相比，基于深度学习的目标检测方法可以学习低级和高级图像特征，有更好的检测精度和泛化能力。为了概括和总结目标检测领域的最新进展和技术，通过分析近年来基于深度学习的目标检测技术，对基于深度学习的目标检测算法与应用现状进行综述。归纳了两阶段与单阶段两种目标检测网络架构的发展及优缺点；从骨干网络、数据集和评价指标等方面进行叙述，对比了经典算法的检测精度，总结经典目标检测算法的改进策略；讨论了现阶段目标检测应用，并提出了目标检测领域今后的研究重点。

关键词: 目标检测, 深度学习, 计算机视觉, 深度卷积神经网络

ZHANG Yangting, HUANG Deqi, WANG Dongwei, HE Jiajia. Review on Research and Application of Deep Learning-Based Target Detection Algorithms[J]. Computer Engineering and Applications, 2023, 59(18): 1-13.

张阳婷, 黄德启, 王东伟, 贺佳佳. 基于深度学习的目标检测算法研究与应用综述[J]. 计算机工程与应用, 2023, 59(18): 1-13.

References

[1] TAIGMAN Y，YANG M，RANZATO M A，et al.Deepface：closing the gap to human-level performance in face verification[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2014：1701-1708.
[2] OUYANG W，WANG X.Joint deep learning for pedestrian detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2013：2056-2063.
[3] KANG K，LI H，YAN J，et al.T-CNN：tubelets with convolutional neural networks for object detection from videos[J].IEEE Transactions on Circuits Systems for Video Technology，2018，28（10）：2896-2907.
[4] 李明熹，林正奎，曲毅.计算机视觉下的车辆目标检测算法综述[J].计算机工程与应用，2019，55（24）：20-28.
LI M X，LIN Z K，QU Y.Survey of vehicle object detection algorithm in computer vision[J].Computer Engineering and Applications，2019，55（24）：20-28.
[5] FELZENSZWALB P F，GIRSHICK R B，MCALLESTER B，et al.Object detection with discriminatively trained part-based models[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，2010，32（9）：1627-1645.
[6] UIJLINGS J，SANDE K，GEVERS T，et al.Selective search for object recognition[J].International Journal of Computer Vision，2013，104（2）：154-171.
[7] VEDALDI A，GULSHAN V，VARMA M，et al.Multiple kernels for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2009：606-613.
[8] YU Y，ZHANG J，HUANG Y，et al.Object detection by context and boosted HOG-LBP[C]//European Conference on Computer Vision Workshop on PASCAL VOC，2010.
[9] KRIZHEVSKY A，SUTSKEVER I，HINTON G E，et al.Imagenet classification with deep convolutional neural networks[J].Communications of the ACM，2017，60（6）：84-90.
[10] GIRSHICK R，DONAHUE J，DARRELL T，et al.Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2014：580-587.
[11] REDMON J，DIVVALA S，GIRSHICK R，et al.You only look once：unified，real-time object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2016：779-788.
[12] HE K，GKIOXARI G，DOLLAR P，et al.Mask R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision，2017：2980-2988.
[13] HE K M，ZHANG X Y，REN S Q，et al.Spatial pyramid pooling in deep convolutional networks for visual recognition[J].European Conference on Computer Vision，2014：346-361.
[14] GIRSHICK R.Fast R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision，2015：1440-1448.
[15] REN S Q，HE K M，GIRSHICK R，et al.Faster R-CNN：towards real-time object detection with region proposal networks[C]//Advances in Neural Information Processing Systems，2015：91-99.
[16] LIU W，ANGUELOV D，ERHAN D，et al.SSD：single shot multibox detector[C]//European Conference on Computer Vision，2016：21-37.
[17] FU C Y，LIU W，RANGA A，et al.DSSD：deconvolutional single shot detector[J].arXiv：1701.06659，2017.
[18] LI Z，ZHOU F.FSSD：feature fusion single shot multibox detector[J].arXiv：1712.00960，2017.
[19] 王子琦，管振玉，朱轶昇，等.基于改进级联RCNN的遥感图像目标检测[J].计算机工程与设计，2023，44（1）：194-202.
WANG Z Q，GUAN Z Y，ZHU Y S，et al.Object detection algorithm of optical remote sensing image based on improved Cascada RCNN[J].Computer Engineering and Design，2023，44（1）：194-202.
[20] 赵珊，郑爱玲，刘子路，等.通道分离双注意力机制的目标检测算法[J].计算机科学与探索，2023，17（5）：1112-1125.
ZHAO S，ZHENG A L，LIU Z L，et al.Object detection algorithm based on channel separation dual attention mechanism[J].Journal of Frontiers of Computer Science and Technology，2023，17（5）：1112-1125.
[21] 林娜，黄韬，孙鹏林，等.优化Mask-RCNN的高分遥感影像建筑物提取[J].遥感信息，2022，37（3）：1-6.
LIN N，HUANG T，SUN P L，et al.Building extraction of high-resolution remote sensing imagery on optimized Mask-RCNN[J].Remote Sensing Information，2022，37（3）：1-6.
[22] SUN P，ZHANG R，JIANG Y，et al.Sparse R-CNN：end-to-end object detection with learnable proposals[J].arXiv：2021.01422，2020.
[23] SERMANET P，NIGEN D，ZHANG X，et al.Overfeat：integrated recognition，localization and detection using convolutional networks[J].arXiv：1312.6229，2013.
[24] 陈欣，万敏杰，马超，等.采用多尺度特征融合SSD的遥感图像小目标检测[J].光学精密工程，2021，29（11）：2672-2682.
CHEN X，WAN M J，MA C，et al.Recognition of small targets in remote sensing image using multi-scale feature fusion-based shot multi-box detector[J].Optics and Precision Engineering，2021，29（11）：2672-2682.
[25] REDMON J，FARHADI A.YOLO9000：better，faster，stronger[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2017：6517-6525.
[26] REDMON J，FARHADI A.YOLOv3：an incremental improvement[J].arXiv：1804.02767，2018.
[27] JOCHER G，STOKEN A，BOROVEC J，?et al.YOLOv5：V3.1-bug?fixes?and performance improvements[EB/OL].（2020）.doi：10.5281/zenodo.4154370，2020.
[28] 张艳，孙晶雪，孙叶美，等.基于分割注意力与线性变换的轻量化目标检测[J].浙江大学学报（工学版），2023，57（6）：1195-1204.
ZHANG Y，SUN J X，SUN Y M，et al.Lightweight object detection based on split attention and linear transformation[J].Journal of Zhejiang University（Engineering Science），2023，57（6）：1195-1204.
[29] WANG C Y，BOCHKOVSKIY A，LIAO H.YOLOv7：trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[J].arXiv：2207.02696，2022.
[30] ZEILER M D，FERGUS R.Visualizing and understanding convolutional networks[C]//European Conference on Computer Vision，2014：818-833.
[31] SIMONYAN K，ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[C]//International Conference on Learning Representations，2015.
[32] SZEGEDY C，LIU W，JIA Y Q，et al.Going deeper with convolutions[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2015：1-9.
[33] IOFFE S，SZEGEDY C.Batch normalization：accelerating deep network training by reducing internal covariate shift[C]//International Conference on Machine Learning，2015：448-456.
[34] SZEGEDY C，VANHOUCKE V，IOFFE S，et al.Rethinking the inception architecture for computer vision[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2016：2818-2826.
[35] HE K M，ZHANG X Y，REN S Q，et al.Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition（CVPR），2018：770-778.
[36] IANDOLA F N，HAN S，MOSKEWICZ M W，et al.Squeezenet：alexnet-level accuracy with 50x fewer parameters and <0.5 MB model size[C]//International Conference on Learning Representations，2016.
[37] CHOLLET F.Xception：deep learning with depthwise separable convolutions[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2017：1800-1807.
[38] HOWARD A G，ZHU M，CHEN B，et al.Mobilenets：efficient convolutional neural networks for mobile vision applications[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2017.
[39] SANDLER M，HOWARD A，ZHU M，et al.Mobilenetv2：inverted residuals and linear bottlenecks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2018：4510-4520.
[40] XIAN Y Z，MENG X L，JIAN S，et al.ShuffleNet：an extremely efficient convolutional neural network for mobile devices[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2017：6848-6856.
[41] MA N，ZHANG X，ZHENG H T，et al.ShuffleNet v2：practical guidelines for efficient cnn architecture design[C]//European Conference on Computer Vision，2018：116-131.
[42] ZOPH B，VASUDEVAN V，SHLENS J，et al.Learning transferable architectures for scalable image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2018：8697-8710.
[43] WANG R J，LI X，AO S，et al.Pelee：a real-time object detection system on mobile devices[C]//Advances in Neural Information Processing Systems（NIPS），2018.
[44] ZEILER M D，KRISHNAN D，TAYLOR G W，et al.Deconvolutional networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2010：2528-2535.
[45] 刘猛，刘劲，尹李君，等.基于迭代剪枝VGGNet的火星图像分类[J].液晶与显示，2023，38（4）：507-514.
LIU M，LIU J，YIN L J，et al.Martian image classification based on iterative pruning VGGNet[J].Chinese Journal of Liquid Crystal and Displays，2023，38（4）：507-514.
[46] 王文秀，郑鹏，徐颖杰，等.基于改进SqueezeNet的棒状物表面缺陷识别[J].电子测量与仪器学报，2023，37（4）：240-249.
WANG W X，ZHENG P，XU Y J，et al.Rods surfaces defect identification based on improved SqueezeNet[J].Journal of Electronic Measurement and Instrumentation，2023，37（4）：240-249.
[47] 黄英来，李宁，刘镇波，等.改进轻量卷积网络在葡萄病害叶片的分类方法[J/OL].哈尔滨理工大学学报：1-9[2023-06-01].http：//kns.cnki.net/kcms/detail/23.1404.N.20230531.
1640.042.html
HUANG Y L，LI N，LIU Z B，et al.Improved lightweight convolutional networks for classification of grape diseased leaves.[J/OL].Journal of Harbin University of Science and Technology：1-9[2023-06-01].http：//kns.cnki.net/kcms/detail/23.1404.N.20230531.1640.042.html.
[48] 王志强，于雪莹，杨晓婧，等.基于WGAN和MCA-MobileNet的番茄叶片病害识别[J].农业机械学报，2023，54（5）：244-252.
WANG Z Q，YU X Y，YANG X J，et al.Tomato leaf disease recognition based on WGAN and MCA-MobileNet[J].Transactions of the Chinese Society for Agricultural Machinery，2023，54（5）：244-252.
[49] 刘星，莫思特，张江，等.轻量化模型的PeleeNet_yolov3地表裂缝识别[J].哈尔滨工业大学学报，2023，55（4）：81-89.
LIU X，MO S T，ZHANG J，et al.PeleeNet_yolov3 surface crack identification with lightweight model[J].Journal of Harbin Institute of Technology，2023，55（4）：81-89.
[50] LECUN Y，BOTTOU L，BENGIO Y，et al.Gradient-based learning applied to document recognition[J].Proceedings of the IEEE，1998，86（11）：2278-2324.
[51] KRIZHEVSKY A，HINTON G.Learning multiple layers of features from tiny images[R].Technical Report of University of Toronto，2009.
[52] DENG J，DONG W，SOCHER R，et al.Imagenet：a large-scale hierarchical image database[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2009：248-255.
[53] KRASIN I，DUERIG T，ALLDRIN N，et al.Openimages：a public dataset for large-scale multi-label and multi-class image classification[EB/OL].（2017）.https：//github.com/openimages.
[54] LIN T Y，MAIRE M，BELONGIE S，et al.Microsoft coco：common objects in context[C]//European Conference on Computer Vision，2014：740-755.
[55] EVERINGHAM M，VAN G L，WILLIAMS C，et al.The pascal visual object classes（VOC） challenge[J].International Journal of Computer Vision，2010（2）：88.
[56] TORRALBA A，FERGUS R，FREEMAN W T.A large data set for nonparametric object and scene recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，2008，30（11）：1958-1970.
[57] XIA G S，BAI X，DING J，et al.DOTA：a large-scale dataset for object detection in aerial images[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition，2018：3974-3983.
[58] YANG S，LUO P，LOY C C，et al.WIDER FACE：a face detection benchmark[C]//IEEE Conference on Computer Vision & Pattern Recognition，2016：5525-5533.
[59] GEIGER A，LENZ P，URTASUN R.Are we ready for autonomous driving? The KITTI vision benchmark suite[C]//IEEE Conference on Computer Vision & Pattern Recognition，2012.
[60] BERGMANN P，FAUSER M，SATTLEGGER D，et al.MVTec AD—a comprehensive real-world dataset for unsupervised anomaly detection[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition（CVPR），2020.
[61] BOCHKOVSKIY A，WANG C Y，LIAO H Y M.YOLOv4：optimal speed and accuracy of object detection[J].arXiv：2004.10934，2020.
[62] GE Z，LIU S T，WANG F，et al.YOLOX：exceeding YOLO series in 2021[J].arXiv：2107.08430，2021.
[63] TAN M，LE Q V.EfficientNet：rethinking model scaling for convolutional neural networks[J].arXiv：1905.11946，2019.
[64] HOWARD A，SANDLER M，CHEN B，et al.Searching for MobileNetV3[C]//International Conference on Computer Vision，2020.
[65] HAN K，WANG Y，TIAN Q，et al.GhostNet；more features rom cheap operations[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition（CVPR），2020.
[66] VIOLA P，JONES M J.Robust real-time face detection[J].International Journal of Computer Vision，2004：57（2）：137-154.
[67] NAJIBI M，SAMANGOUEI P，CHELLAPPA R，et al.SSh：single stage headless face detector[C]//Proceedings of the IEEE International Conference on Computer Vision，2004：4875-4884.
[68] WU H Y，CHEN Q，YACHIDA M.A fuzzy-theory-based face detector[C]//Proceedings of the 13th International Conference on Pattern Recognition，Vinenna，Austria，1996：406-410.
[69] JIANG H.Face detection with the faster R-CNN[C]//IEEE International Conference on Automatic Face and Gesture Recognition，2017：650-657.
[70] BAZAREVSKY V，KARTYNNIK Y，VAKUNOV A，et al.BlazeFace：sub-millisecond neural face detection on mobile GPUs[J].arXiv：1907.05047，2019.
[71] ITTI L，KOCH C，NIEBUR E.A model of saliency-based visual attention for rapid scene analysis[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，1998，20（11）：1254-1259.
[72] 李俊文，张红英，韩宾.深层特征聚合引导的轻量级显著性目标检测[J/OL].计算机工程与应用：1-9[2023-05-24].http：//kns.cnki.net/kcms/detail/11.2127.TP.20220623.1617.
010.html.
LI J W，ZHANG H Y，HAN B.Lightweight saliency object detection guided by deep feature aggregation[J/OL].Computer Engineering and Applications：1-9[2023-05-24].http：//kns.cnki.net/kcms/detail/11.2127.TP.20220623.1617.
010.html.
[73] SHENG C，YANG L，XIANG G，et al.MobileFaceNets：efficient CNNs for accurate real-time face verification on mobile devices[J].arXiv：1804.07573，2018.
[74] MAO J，XIAO T，JIANG Y，et al.What can help pedestrian detection?[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2017：3127-3136.
[75] LI J，LIANG X，SHEN S，et al.Scale-aware fast R-CNN for pedestrian detection[J].IEEE Transactions on Multimedia，2017，20（4）：985-996.
[76] 陈宁，李梦璐，袁皓，等.遮挡情形下的行人检测方法综述[J].计算机工程与应用，2020，56（16）：13-20.
CHEN N，LI M L，YUAN H，et al.Review of pedestrian detection with occlusion[J].Computer Engineering and Applications，2020，56（16）：13-20.
[77] TIAN Y，LUO P，WANG X，et al.Deep learning strong parts for pedestrian detection[C]//Proceedings of the IEEE International Conference on Computer Vision，2015：1904-1912.
[78] WANG R J，LI X，LING C X.Pelee：a real-time object detection system on mobile devices[C]//Conference on Neural Information Processing Systems，2018.
[79] 张大奇，范慧颖，康宝生，等.基于改进U-Net网络的复杂背景下冰川遥感图像检测方法[J].应用基础与工程科学学报，2022，30（4）：806-818.
ZHANG D Q，FAN H Y，KANG B S，et al.Glacier identification from remote sensing image with shadows using an improved U-Net convolutional network[J].Journal of Basic Science and Engineering，2022，30（4）：806-818.
[80] 李坤亚，欧鸥，刘广滨，等.改进YOLOv5的遥感图像目标检测算法[J].计算机工程与应用，2023，59（9）：207-214.
LI K Y，OU O，LIU G B，et al.Target detection algorithm of remote sensing image based on improved YOLOv5[J].Computer Engineering and Applications，2023，59（9）：207-214.
[81] LI B，XIE X Y，WEI X X，et al.Ship detection and classification from optical remote sensing images：a survey[J].Chinese Journal of Aeronautics，2021，34（3）：145-163.