深度学习驱动下的目标检测研究进展综述

doi:10.3778/j.issn.1002-8331.2407-0038

摘要/Abstract

摘要： 近年来，深度学习在GPU高性能计算能力的加持下得到了迅速推广，并在安防、医疗、工业等领域实现了广泛应用。目标检测模型的性能也在稳步提高，从传统的目标检测方法逐渐过渡到基于卷积神经网络（CNN）深度学习的进一步应用，极大地节省了人力物力。通过参考大量文献，按照两阶段脉络梳理了目标检测的发展历程以及近年深度学习在目标检测领域内的研究进展，对比了在不同数据集上模型网络的性能，总结不同方法的优势与不足，并对领域内重要数据集作了归纳，还对目标检测算法的落地效果做了总结，特别是生活与科技中的实际应用（无人驾驶、医学图像、遥感等）。最后，还对深度学习驱动下目标检测在未来研究上的机遇和挑战作了展望。

关键词: 目标检测, 卷积神经网络, 单阶段, 两阶段, 目标检测应用

Abstract: In recent years, deep learning, driven by high-performance GPU computing, has rapidly expanded into security, healthcare, and industry. Object detection models have evolved from traditional methods to convolutional neural networks (CNN), significantly saving resources. This review outlines the development of object detection and recent advances in deep learning by referencing extensive literature and following a two-stage framework. It compares model performance across different datasets, summarizes the strengths and weaknesses of various methods, and highlights key datasets. The review also discusses the practical applications of object detection algorithms, particularly in autonomous driving, medical imaging, and remote sensing. Finally, it explores the opportunities and challenges for future research in deep learning-driven object detection.

Key words: object detection, convolutional neural networks, single stage, two stages, object detection applications

山显英, 张琳, 李泽慧. 深度学习驱动下的目标检测研究进展综述[J]. 计算机工程与应用, 2025, 61(1): 24-41.

SHAN Xianying, ZHANG Lin, LI Zehui. Review of Research Progress in Object Detection Driven by Deep Learning[J]. Computer Engineering and Applications, 2025, 61(1): 24-41.

参考文献

[1] SHARMA V K, MIR R N. A comprehensive and systematic look up into deep learning based object detection techniques: a review[J]. Computer Science Review, 2020, 38: 100301.
[2] ZOU Z, CHEN K, SHI Z, et al. Object detection in 20 years: a survey[J]. Proceedings of the IEEE, 2023, 111(3): 257-276.
[3] WU X, SAHOO D, HOI S C H. Recent advances in deep learning for object detection[J]. Neurocomputing, 2020, 396: 39-64.
[4] LIU L, OUYANG W, WANG X, et al. Deep learning for generic object detection: a survey[J]. International Journal of Computer Vision, 2020, 128: 261-318.
[5] QIAN R, LAI X, LI X. 3D object detection for autonomous driving: a survey[J]. Pattern Recognition, 2022, 130: 108796.
[6] LIU Y, SUN P, WERGELES N, et al. A survey and performance evaluation of deep learning methods for small object detection[J]. Expert Systems with Applications, 2021, 172: 114602.
[7] XIN Z, CHEN S, WU T, et al. Few-shot object detection: research advances and challenges[J]. Information Fusion, 2024, 107: 102307.
[8] ARCHANA R, JEEVARAJ P S E. Deep learning models for digital image processing: a review[J]. Artificial Intelligence Review, 2024, 57(1): 11.
[9] RATH M, CONDURACHE A P. Boosting deep neural networks with geometrical prior knowledge: a survey[J]. Artificial Intelligence Review, 2024, 57(4): 95.
[10] MA X, OUYANG W, SIMONELLI A, et al. 3D object detection from images for autonomous driving: a survey[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024, 46(5): 3537-3556.
[11] 杨成帮, 王安志, 任春洪, 等. 基于深度神经网络的视频显著目标检测综述[J]. 计算机工程与应用, 2024, 60(19): 68-79.
YANG C B, WANG A Z, REN C H, et al. Review of video salient object detection based on deep neural networks[J]. Computer Engineering and Applications, 2024, 60(19): 68-79.
[12] VIOLA P, JONES M. Rapid object detection using a boosted cascade of simple features[C]//Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2001.
[13] DALAL N, TRIGGS B. Histograms of oriented gradients for human detection[C]//Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005: 886-893.
[14] FELZENSZWALB P F, GIRSHICK R B, MCALLESTER D, et al. Object detection with discriminatively trained part-based models[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009, 32(9): 1627-1645.
[15] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84-90.
[16] GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014: 580-587.
[17] ZHOU Q, YU C. Point RCNN: an angle-free framework for rotated object detection[J]. Remote Sensing, 2022, 14(11): 2605.
[18] HE K, ZHANG X, REN S, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9): 1904-1916.
[19] LIU Q, FAN X, XI Z, et al. Object detection based on YOLOv4-tiny and improved bidirectional feature pyramid network[J]. Journal of Physics: Conference Series, 2022, 2209(1): 012023.
[20] GIRSHICK R. Fast R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision, 2015: 1440-1448.
[21] REN S, HE K, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 39(6): 1137-1149.
[22] DAI J, LI Y, HE K, et al. R-FCN: object detection via region-based fully convolutional networks[C]//Advances in Neural Information Processing Systems, 2016, 29.
[23] HE K, GKIOXARI G, DOLLáR P, et al. Mask R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision, 2017: 2961-2969.
[24] LIN T Y, DOLLáR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 2117-2125.
[25] CAI Z, VASCONCELOS N. Cascade R-CNN: delving into high quality object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 6154-6162.
[26] CAI Z, VASCONCELOS N. Cascade R-CNN: high quality object detection and instance segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 43(5): 1483-1498.
[27] GHIASI G, LIN T Y, LE Q V. NAS-FPN: learning scalable feature pyramid architecture for object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 7036-7045.
[28] TAN M, PANG R, LE Q V. EfficientDet: scalable and efficient object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 10781-10790.
[29] SUN P, ZHANG R, JIANG Y, et al. Sparse R-CNN: end-to-end object detection with learnable proposals[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 14454-14463.
[30] LIU Z, LIN Y, CAO Y, et al. Swin transformer: hierarchical vision transformer using shifted windows[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021: 10012-10022.
[31] XU X, FENG Z, CAO C, et al. An improved swin transformer-based model for remote sensing object detection and instance segmentation[J]. Remote Sensing, 2021, 13(23): 4779.
[32] XIA Z, PAN X, SONG S, et al. Vision transformer with deformable attention[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 4794-4803.
[33] YANG X, ZHANG G, LI W, et al. H2RBox: horizontal box annotation is all you need for oriented object detection[J]. arXiv: 2210. 06742, 2022.
[34] YU Y, YANG X, LI Q, et al. H2RBox-v2: incorporating symmetry for boosting horizontal box supervised oriented object detection[C]//Advances in Neural Information Processing Systems, 2024, 36.
[35] SZEGEDY C, TOSHEV A, ERHAN D. Deep neural networks for object detection[C]//Advances in Neural Information Processing Systems, 2013, 26.
[36] ARULPRAKASH E, ARULDOSS M. A study on generic object detection with emphasis on future research directions[J]. Journal of King Saud University-Computer and Information Sciences, 2022, 34(9): 7347-7365.
[37] SERMANET P, EIGEN D, ZHANG X, et al. Overfeat: integrated recognition, localization and detection using convolutional networks[J]. arXiv:1312.6229, 2013.
[38] LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multibox detector[C]//Proceedings of the 14th European Conference on Computer Vision, Amsterdam, The Netherlands, Oct 11-14, 2016. Cham: Springer International Publishing, 2016: 21-37.
[39] REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 779-788.
[40] ZHOU Y. A YOLO-NL object detector for real-time detection[J]. Expert Systems with Applications, 2024, 238: 122256.
[41] REDMON J, FARHADI A. YOLO9000: better, faster, stronger[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 7263-7271.
[42] REDMON J, FARHADI A. YOLOV3: an incremental improvement[J]. arXiv:1804.02767, 2018.
[43] LAW H, DENG J. CornerNet: detecting objects as paired keypoints[C]//Proceedings of the European Conference on Computer Vision, 2018: 734-750.
[44] DUAN K, BAI S, XIE L, et al. CenterNet: keypoint triplets for object detection[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019: 6569-6578.
[45] TIAN Z, SHEN C, CHEN H, et al. FCOS: a simple and strong anchor-free object detector[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 44(4): 1922-1933.
[46] BOCHKOVSKIY A, WANG C Y, LIAO H Y M. YOLOv4: optimal speed and accuracy of object detection[J]. arXiv:2004.10934, 2020.
[47] WANG C Y, LIAO H Y M, WU Y H, et al. CSPNet: a new backbone that can enhance learning capability of CNN[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020: 390-391.
[48] LIU S, QI L, QIN H, et al. Path aggregation network for instance segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 8759-8768.
[49] ZAIDI S S A, ANSARI M S, ASLAM A, et al. A survey of modern deep learning based object detection models[J]. Digital Signal Processing, 2022, 126: 103514.
[50] CARION N, MASSA F, SYNNAEVE G, et al. End-to-end object detection with transformers[C]//European Conference on Computer Vision. Cham: Springer International Publishing, 2020: 213-229.
[51] ZHAO Y, LV W, XU S, et al. DETRs beat YOLOs on real-time object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024: 16965-16974.
[52] ZONG Z, SONG G, LIU Y. DETRs with collaborative hybrid assignments training[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023: 6748-6758.
[53] KAMATH V, RENUKA A. Deep learning based object detection for resource constrained devices: systematic review, future trends and challenges ahead[J]. Neuro-computing, 2023, 531: 34-60.
[54] LONG X, DENG K, WANG G, et al. PP-YOLO: an effective and efficient implementation of object detector[J]. arXiv:2007.12099, 2020.
[55] GE Z, LIU S, WANG F, et al. YOLOX: exceeding YOLO series in 2021[J]. arXiv:2107.08430, 2021.
[56] HUANG X, WANG X, LV W, et al. PP-YOLOv2: a practical object detector[J]. arXiv:2104.10419, 2021.
[57] XU S, WANG X, LV W, et al. PP-YOLOE: an evolved version of YOLO[J]. arXiv:2203.16250, 2022.
[58] LI C, LI L, JIANG H, et al. YOLOv6: a single-stage object detection framework for industrial applications[J]. arXiv:2209.02976, 2022.
[59] 张艳, 孙晶雪, 孙叶美. 基于分割注意力与线性变换的轻量化目标检测[J]. 浙江大学学报 (工学版), 2023, 57(6): 1195-1204.
ZHANG Y, SUN J X, SUN Y M. Lightweight object detection based on split attention and linear transformation[J]. Journal of Zhejiang University (Engineering?Science), 2023, 57(6): 1195-1204.
[60] WANG C Y, BOCHKOVSKIY A, LIAO H Y M. YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023: 7464-7475.
[61] HOWARD A, SANDLER M, CHU G, et al. Searching for MobileNetV3[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019: 1314-1324.
[62] FAN L, WANG F, WANG N, et al. FSD V2: improving fully sparse 3D object detection with virtual voxels[J]. arXiv:2308.03755, 2023.
[63] WANG L, LIU Y, DU P, et al. Object-aware distillation pyramid for open-vocabulary object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023: 11186-11196.
[64] WANG C Y, YEH I H, LIAO H Y M. YOLOv9: learning what you want to learn using programmable gradient information[J]. arXiv:2402.13616, 2024.
[65] CHIEN C T, JU R Y, CHOU K Y, et al. YOLOv9 for fracture detection in pediatric wrist trauma X‐ray images[J]. Electronics Letters, 2024, 60(11): e13248.
[66] ZEILER M D, FERGUS R. Visualizing and understanding convolutional networks[C]//Proceedings of the 13th European Conference on Computer Vision, Zurich, Switzerland, Sep 6-12, 2014. Cham: Springer International Publishing, 2014: 818-833.
[67] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[J]. arXiv:1409. 1556, 2014.
[68] SZEGEDY C, LIU W, JIA Y, et al. Going deeper with convolutions[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015: 1-9.
[69] SRINIVAS A, LIN T Y, PARMAR N, et al. Bottleneck transformers for visual recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 16519-16529.
[70] XIE S, GIRSHICK R, DOLLáR P, et al. Aggregated residual transformations for deep neural networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 1492-1500.
[71] LIU Z, HU H, LIN Y, et al. Swin transformer v2: scaling up capacity and resolution[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 12009-12019.
[72] WENG K, CHU X, XU X, et al. EfficientRep: an efficient repvgg-style ConvNets with hardware-aware neural network design[J]. arXiv:2302.00386, 2023.
[73] EVERINGHAM M, ESLAMI S M A, VAN GOOL L, et al. The pascal visual object classes challenge: a retro-spective[J]. International Journal of Computer Vision, 2015, 111: 98-136.
[74] DENG J, DONG W, SOCHER R, et al. ImageNet: a large-scale hierarchical image database[C]//Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009: 248-255.
[75] LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft COCO: common objects in context[C]//Proceedings of the 13th European Conference on Computer Vision, Zurich, Switzerland, Sep 6-12, 2014. Cham: Springer International Publishing, 2014: 740-755.
[76] KUZNETSOVA A, ROM H, ALLDRIN N, et al. The open images dataset v4: unified image classification, object detection, and visual relationship detection at scale[J]. International Journal of Computer Vision, 2020, 128(7): 1956-1981.
[77] SHAO S, LI Z, ZHANG T, et al. Objects365: a large-scale, high-quality dataset for object detection[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019: 8430-8439.
[78] DOLLáR P, WOJEK C, SCHIELE B, et al. Pedestrian detection: a benchmark[C]//Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009: 304-311.
[79] GEIGER A, LENZ P, URTASUN R. Are we ready for autonomous driving? the kitti vision benchmark suite[C]//Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012: 3354-3361.
[80] ZHANG S, BENENSON R, SCHIELE B. CityPersons: a diverse dataset for pedestrian detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 3213-3221.
[81] SHAO S, ZHAO Z, LI B, et al. CrowdHuman: a benchmark for detecting human in a crowd[J]. arXiv:1805.00123, 2018.
[82] JAIN V, LEARNED-MILLER E. FDDB: a benchmark for face detection in unconstrained settings[R]. UMass Amherst Technical Report, 2010.
[83] KOESTINGER M, WOHLHART P, ROTH P M, et al. Annotated facial landmarks in the wild: a large-scale, real-world database for facial landmark localization[C]//Proceedings of the 2011 IEEE International Conference on Computer Vision Workshops, 2011: 2144-2151.
[84] YI D, LEI Z, LIAO S, et al. Learning face representation from scratch[J]. arXiv:1411.7923, 2014.
[85] YANG S, LUO P, LOY C C, et al. Wider face: a face detection benchmark[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 5525-5533.
[86] ZHANG Z, SONG Y, QI H. Age progression/regression by conditional adversarial autoencoder[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 5810-5818.
[87] NADA H, SINDAGI V A, ZHANG H, et al. Pushing the limits of unconstrained face detection: a challenge dataset and baseline results[C]//Proceedings of the 2018 IEEE 9th International Conference on Biometrics Theory, Applications and Systems, 2018: 1-10.
[88] HEITZ G, KOLLER D. Learning spatial context: using stuff to find things[C]//Proceedings of the 10th European Conference on Computer Vision, Marseille, France, Oct 12-18, 2008. Berlin, Heidelberg: Springer, 2008: 30-43.
[89] TANNER F, COLDER B, PULLEN C, et al. Overhead imagery research data set—an annotated data library & tools to aid in the development of computer vision algorithms[C]//Proceedings of the 2009 IEEE Applied Imagery Pattern Recognition Workshop, 2009: 1-8.
[90] ZHU H, CHEN X, DAI W, et al. Orientation robust object detection in aerial images using deep convolutional neural network[C]//Proceedings of the 2015 IEEE International Conference on Image Processing, 2015: 3735-3739.
[91] LIU Z, YUAN L, WENG L, et al. A high resolution optical satellite image dataset for ship recognition and some new baselines[C]//Proceedings of the International Conference on Pattern Recognition Applications and Methods, 2017: 324-331.
[92] ZOU Z, SHI Z. Random access memories: a new paradigm for target detection in high resolution aerial remote sensing images[J]. IEEE Transactions on Image Processing, 2017, 27(3): 1100-1111.
[93] XIA G S, BAI X, DING J, et al. DOTA: a large-scale dataset for object detection in aerial images[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 3974-3983.
[94] LAM D, KUZMA R, MCGEE K, et al. Xview: objects in context in overhead imagery[J]. arXiv:1802.07856, 2018.
[95] SAMUEL G. The lung image database consortium (LIDC) and image database resource initiative (IDRI): a completed reference database of lung nodules on CT scans[J]. Medical Physics, 2011, 38(2): 915-931.
[96] MENZE B H, JAKAB A, BAUER S, et al. The multimodal brain tumor image segmentation benchmark (BRATS)[J]. IEEE Transactions on Medical Imaging, 2014, 34(10): 1993-2024.
[97] WANG X, PENG Y, LU L, et al. Chestx-ray8: hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 2097-2106.
[98] RAJPURKAR P, IRVIN J, BAGUL A, et al. MURA: large dataset for abnormality detection in musculoskeletal radiographs[J]. arXiv:1712.06957, 2017.
[99] ROTEMBERG V, KURTANSKY N, BETZ-STABLEIN B, et al. A patient-centric dataset of images and metadata for identifying melanomas using clinical context[J]. Scientific Data, 2021, 8(1): 34.
[100] LI C, GUO C, REN W, et al. An underwater image enhancement benchmark dataset and beyond[J]. IEEE Transactions on Image Processing, 2019, 29: 4376-4389.
[101] ISLAM M J, EDGE C, XIAO Y, et al. Semantic segmentation of underwater imagery: dataset and benchmark[C]//Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2020: 1769-1776.
[102] BERMAN D, LEVY D, AVIDAN S, et al. Underwater single image color restoration using haze-lines and a new quantitative dataset[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 43(8): 2822-2837.
[103] HONG L, WANG X, ZHANG G, et al. USOD10K: a new benchmark dataset for underwater salient object detection[J]. IEEE Transactions on Image Processing, 2023.
[104] CHEN Y N, DAI H, DING Y. Pseudo-stereo for monocular 3D object detection in autonomous driving[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 887-897.
[105] SHI X, SHAN S, KAN M, et al. Real-time rotation-invariant face detection with progressive calibration networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 2295-2303.
[106] GAO J, YANG T. Face detection algorithm based on improved TinyYOLOv3 and attention mechanism[J]. Computer Communications, 2022, 181: 329-337.
[107] YADAV A, VISHWAKARMA D K. AW-MSA: adaptively weighted multi-scale attentional features for deepfake detection[J]. Engineering Applications of Artificial Intelligence, 2024, 127: 107443.
[108] ANTONIOU A, STORKEY A, EDWARDS H. Data augmentation generative adversarial networks[J]. arXiv:1711. 04340, 2017.
[109] 蔡腾, 陈慈发, 董方敏. 结合Transformer和动态特征融合的低照度目标检测[J]. 计算机工程与应用, 2024, 60(9): 135-141.
CAI T, CHEN C F, DONG F M. Low-light object detection combining transformer and dynamic feature fusion[J]. Computer Engineering and Applications, 2024, 60(9): 135-141.
[110] 颜豪男, 吕伏, 冯永安. 特征级自适应增强的无人机目标检测算法[J]. 计算机科学与探索, 2024, 18(6): 1566-1578.
YAN H N, LYU F, FENG Y A. Feature-level adaptive enhancement for UAV target detection algorithm[J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(6): 1566-1578.
[111] LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[C]//Proceedings of the IEEE International Conference on Computer Vision, 2017: 2980-2988.
[112] LI Z, GUO C, NIE D, et al. Deep learning for detecting retinal detachment and discerning macular status using ultra-widefield fundus images[J]. Communications Biology, 2020, 3(1): 15.
[113] LI L, XU M, WANG X, et al. Attention based glaucoma detection: a large-scale database and CNN model[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 10571-10580.
[114] YAN K, WANG X, LU L, et al. DeepLesion: automated mining of large-scale lesion annotations and universal lesion detection with deep learning[J]. Journal of Medical Imaging, 2018, 5(3): 036501.
[115] KONG B, ZHAN Y, SHIN M, et al. Recognizing end-diastole and end-systole frames via deep temporal regression network[C]//Proceedings of the 19th International Conference on Medical Image Computing and Computer-Assisted Intervention, Athens, Greece, Oct 17-21, 2016. Cham: Springer International Publishing, 2016: 264-272.
[116] LI B, XIE X Y, WEI X X, et al. Ship detection and classification from optical remote sensing images: a survey[J]. Chinese Journal of Aeronautics, 2021, 34(3): 145-163.
[117] JIAN J, LIU L, ZHANG Y, et al. Optical remote sensing ship recognition and classification based on improved yolov5[J]. Remote Sensing, 2023, 15(17): 4319.
[118] LIU Y, ZHANG R, DENG R, et al. Ship detection and classification based on cascaded detection of hull and wake from optical satellite remote sensing imagery[J]. GI-Science & Remote Sensing, 2023, 60(1): 2196159.
[119] YEH C H, LIN C H, KANG L W, et al. Lightweight deep neural network for joint learning of underwater object detection and color conversion[J]. IEEE Transactions on Neural Networks and Learning Systems, 2021, 33(11): 6129-6143.
[120] QI Q, LI K, ZHENG H, et al. SGUIE-Net: semantic attention guided underwater image enhancement with multi-scale perception[J]. IEEE Transactions on Image Processing, 2022, 31: 6816-6830.
[121] PENG L, ZHU C, BIAN L. U-shape transformer for underwater image enhancement[J]. IEEE Transactions on Image Processing, 2023, 32: 3066-3079.
[122] XU S, ZHANG M, SONG W, et al. A systematic review and analysis of deep learning-based underwater object detection[J]. Neurocomputing, 2023, 527: 204-232.
[123] YAN B, FAN P, LEI X, et al. A real-time apple targets detection method for picking robot based on improved YOLOv5[J]. Remote Sensing, 2021, 13(9): 1619.
[124] BAI Y, MAO S, ZHOU J, et al. Clustered tomato detection and picking point location using machine learning-aided image analysis for automatic robotic harvesting[J]. Precision Agriculture, 2023, 24(2): 727-743.
[125] LIU S, TIAN G, ZHANG Y, et al. Active object detection based on a novel deep q-learning network and long-term learning strategy for the service robot[J]. IEEE Transactions on Industrial Electronics, 2021, 69(6): 5984-5993.
[126] WANG X, SHEN M, YANG K. On-edge high-throughput collaborative inference for real-time video analytics[J]. IEEE Internet of Things Journal, 2024, 11(20): 33097-33109.
[127] HAJIZADEH M, SABOKROU M, RAHMANI A. MobileDenseNet: a new approach to object detection on mobile devices[J]. Expert Systems with Applications, 2023, 215: 119348.
[128] SINGH S A, DESAI K A. Automated surface defect detection framework using machine vision and convolutional neural networks[J]. Journal of Intelligent Manufacturing, 2023, 34(4): 1995-2011.
[129] ZHENG Q, TIAN X, YU Z, et al. MobileRaT: a lightweight radio transformer method for automatic modulation classification in drone communication systems[J]. Drones, 2023, 7(10): 596.
[130] ZHENG Q, SAPONARA S, TIAN X, et al. A real-time constellation image classification method of wireless communication signals based on the lightweight network MobileViT[J]. Cognitive Neurodynamics, 2024, 18(2): 659-671.