计算机工程与应用 ›› 2023, Vol. 59 ›› Issue (1): 37-48.DOI: 10.3778/j.issn.1002-8331.2205-0354
付苗苗,邓淼磊,张德贤
出版日期:
2023-01-01
发布日期:
2023-01-01
FU Miaomiao, DENG Miaolei, ZHANG Dexian
Online:
2023-01-01
Published:
2023-01-01
摘要: 目标检测是实现目标跟踪、实例分割等高级视觉任务的基础,在智慧交通、缺陷检测、智能安防等现实场景有着重要应用。现有高精度检测算法都是在深度学习的指导下实现,同时伴有锚框技术,但是锚框自身的不足对检测器性能有着较大影响,无锚点碰撞检测成为了近几年目标检测领域新的研究方向。与此同时,Transformer表现出的巨大潜力为视觉领域开辟了图像与Transformer结合这个新方向,基于Transformer的目标检测也成为一个新的研究热点。系统地总结了深度学习时代的目标检测算法,调查并研究了近五年目标检测的相关论文,重点从Anchor-free和Transformer两个角度对这些算法进行深入分析,介绍了这些算法在现实场景具体应用情况以及目标检测领域常用数据集,基于目前的研究现状对目标检测的未来可研究方向进行了展望。
付苗苗, 邓淼磊, 张德贤. 基于深度学习和Transformer的目标检测算法[J]. 计算机工程与应用, 2023, 59(1): 37-48.
FU Miaomiao, DENG Miaolei, ZHANG Dexian. Object Detection Algorithms Based on Deep Learning and Transformer[J]. Computer Engineering and Applications, 2023, 59(1): 37-48.
[1] KRIZHEVSKY A,SUTSKEVER I,HINTON G E.Imagenet classification with deep convolutional neural networks[C]//Advances in Neural Information Processing Systems,2012:1097-1105. [2] VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need[C]//Advances in Neural Information Processing Systems,2017. [3] DOSOVITSKIY A,BEYER L,KOLESNIKOV A,et al.An image is worth 16×16 words:Transformers for image recognition at scale[J].arXiv:2010.11929,2020. [4] 刘文婷,卢新明.基于计算机视觉的Transformer研究进展[J].计算机工程与应用,2022,58(6):1-16. LIU W T,LU X M.Research progress of Transformer based on computer vision[J].Computer Engineering and Applications,2022,58(6):1-16. [5] GIRSHICK R,DONAHUE J,DARRELL T,et al.Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2014:580-587. [6] HE K M,ZHANG X Y,REN S Q,et al.Spatial pyramid pooling in deep convolutional networks for visual recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2015,37(9):1904-1916. [7] GIRSHICK R.Fast R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision,2015:1440-1448. [8] REN S Q,HE K M,GIRSHICK R,et al.Faster R-CNN:Towards real-time object detection with region proposal networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,39(6):1137-1149. [9] DAI J,LI Y,HE K,et al.R-FCN:Object detection via region-based fully convolutional networks[C]//Advances in Neural Information Processing Systems,2016. [10] HE K,GKIOXARI G,DOLLáR P,et al.Mask R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision,2017:2961-2969. [11] CAI Z W,VASCONCELOS N.Cascade R-CNN:Delving into high quality object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2018:6154-6162. [12] LIU W,ANGUELOV D,ERHAN D,et al.SSD:Single shot multibox detector[C]//Proceedings of the European Conference on Computer Vision,2016:21-37. [13] FU C Y,LIU W,RANGA A,et al.DSSD:Deconvolutional single shot detector[J].arXiv:1701.06659,2017. [14] LI Z,ZHOU F.FSSD:Feature fusion single shot multibox detector[J].arXiv:1712.00960,2017. [15] JEONG J,PARK H,KWAK N.Enhancement of SSD by concatenating feature maps for object detection[J].arXiv:1705.09587,2017. [16] REDMON J,FARHADI A.YOLO9000:Better,faster,stronger[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2017:6517-6525. [17] REDMON J,FARHADI A.YOLOv3:An incremental improvement[J].arXiv:1804.02767,2018. [18] BOCHKOVSKIY A,WANG C Y,LIAO H Y M.YOLOv4:Optimal speed and accuracy of object detection[J].arXiv:2004.10934,2020. [19] LIN T Y,GOYAL P,GIRSHICK R,et al.Focal loss for dense object detection[C]//Proceedings of the IEEE International Conference on Computer Vision,2017:2999-3007. [20] HUANG L,YANG Y,DENG Y,et al.Densebox:Unifying landmark localization with end to end object detection[J].arXiv:1509.04874,2015. [21] REDMON J,DIVVALA S,GIRSHICK R,et al.You only look once:Unified,real-time object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2016:779-788. [22] NEWELL A,YANG K,DENG J.Stacked hourglass networks for human pose estimation[C]//Proceedings of the European Conference on Computer Vision,2016:483-499. [23] LAW H,DENG J.CornerNet:Detecting objects as paired keypoints[C]//Proceedings of the European Conference on Computer Vision,2018:734-750. [24] LAW H,TENG Y,RUSSAKOVSKY O,et al.Cornernet-lite:Efficient keypoint based object detection[J].arXiv:1904.08900,2019. [25] IANDOLA F N,HAN S,MOSKEWICZ M W,et al.SqueezeNet:AlexNet-level accuracy with 50x fewer parameters and <0.5?MB model size[J].arXiv:1602.07360,2016. [26] HOWARD A G,ZHU M,CHEN B,et al.MobileNets:Efficient convolutional neural networks for mobile vision applications[J].arXiv:1704.04861,2017. [27] DUAN K W,BAI S,XIE L X,et al.CenterNet:Keypoint triplets for object detection[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision,2019:6568-6577. [28] DONG Z W,LI G X,LIAO Y,et al.CentripetalNet:Pursuing high-quality keypoint pairs for object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2020:10516-10525. [29] ZHOU X Y,ZHUO J C,KR?HENBüHL P.Bottom-up object detection by grouping extreme and center points[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2019:850-859. [30] PAPADOPOULOS D P,UIJLINGS J R R,KELLER F,et al.Extreme clicking for efficient object annotation[C]//Proceedings of the IEEE International Conference on Computer Vision,2017:4940-4949. [31] TIAN Z,SHEN C H,CHEN H,et al.FCOS:Fully convolutional one-stage object detection[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision,2019:9626-9635. [32] LIN T Y,DOLLáR P,GIRSHICK R,et al.Feature pyramid networks for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2017:936-944. [33] KONG T,SUN F C,LIU H P,et al.FoveaBox:Beyound anchor-based object detection[J].IEEE Transactions on Image Processing,2020,29:7389-7398. [34] ZHANG S,CHI C,YAO Y,et al.Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2020:9759-9768. [35] 伏轩仪,张銮景,梁文科,等.锚点机制在目标检测领域的发展综述[J].计算机科学与探索,2022,16(4):791-805. FU X Y,ZHANG L J,LIANG W K,et al.Review on development of anchor mechanism in object detection[J].Journal of Frontiers of Computer Science and Technology,2022,16(4):791-805. [36] SUN P,JIANG Y,XIE E,et al.Onenet:Towards end-to-end one-stage object detection[J].arXiv:2012.05780,2020. [37] SUN P,ZHANG R,JIANG Y,et al.Sparse R-CNN:End-to-end object detection with learnable proposals[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2021:14454-14463. [38] ZHU C,HE Y,SAVVIDES M.Feature selective anchor-free module for single-shot object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2019:840-849. [39] CARION N,MASSA F,SYNNAEVE G,et al.End-to-end object detection with transformers[C]//Proceedings of the European Conference on Computer Vision,2020:213-229. [40] ZHU X,SU W,LU L,et al.Deformable DETR:Deformable transformers for end-to-end object detection[J].arXiv:2010.04159,2020. [41] DAI J F,QI H Z,XIONG Y W,et al.Deformable convolutional networks[C]//Proceedings of the IEEE International Conference on Computer Vision,2017:764-773. [42] SUN Z,CAO S,YANG Y,et al.Rethinking transformer-based set prediction for object detection[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision,2021:3611-3620. [43] ZHENG M,GAO P,ZHANG R,et al.End-to-end object detection with adaptive clustering transformer[J].arXiv:2011.09315,2020. [44] DAI Z G,CAI B L,LIN Y G,et al.UP-DETR:Unsupervised pre-training for object detection with transformers[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2021:1601-1610. [45] LIU S,QI L,QIN H,et al.Path aggregation network for instance segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2018:8759-8768. [46] TAN M,PANG R,LE Q V.Efficientdet:Scalable and efficient object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2020:10781-10790. [47] ZHANG D,ZHANG H,TANG J,et al.Feature pyramid transformer[C]//Proceedings of the European Conference on Computer Vision,2020:323-339. [48] LIU Z,LIN Y,CAO Y,et al.Swin transformer:Hierarchical vision transformer using shifted windows[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision,2021:10012-10022. [49] LIU Z,HU H,LIN Y,et al.Swin Transformer v2:Scaling up capacity and resolution[J].arXiv:2111.09883,2021. [50] WANG H,ZHU Y,ADAM H,et al.MaX-DeepLab:End-to-end panoptic segmentation with mask transformers[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2021:5463-5474. [51] WANG Y,XU Z,WANG X,et al.End-to-end video instance segmentation with transformers[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2021:8741-8750. [52] LIN M,LI C,BU X,et al.DETr for pedestrian detection[J].arXiv:2012.06785,2020. [53] LIU R J,YUAN Z J,LIU T,et al.End-to-end lane shape prediction with transformers[C]//Proceedings of the IEEE Winter Conference on Applications of Computer Vision,2021:3693-3701. [54] HUANG L,TAN J,LIU J,et al.Hand-transformer:Non-autoregressive structured modeling for 3D hand pose estimation[C]//Proceedings of the European Conference on Computer Vision,2020:17-33. [55] LIN K,WANG L,LIU Z.End-to-end human pose and mesh reconstruction with transformers[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2021:1954-1963. [56] CAO H,WANG Y,CHEN J,et al.Swin-UNet:UNet-like pure transformer for medical image segmentation[J].arXiv:2105.05537,2021. [57] GUO M H,CAI J X,LIU Z N,et al.PCT:Point cloud Transformer[J].Computational Visual Media,2021,7(2):187-199. [58] 奉志强,谢志军,包正伟,等.基于改进YOLOv5的无人机实时密集小目标检测算法[J/OL].航空学报:1-15[2022-05-10].http://kns.cnki.net/kcms/detail/11.1929.V.20220509. 2316.010.html. FENG Z Q,XIE Z J,BAO Z W,et al.UAV real-time dense small target detection algorithm based on improved YOLOv5[J/OL].Journal of Aeronautics and Astronautics:1-15[2022-05-10].http://kns.cnki.net/kcms/detail/11.1929.V.20220509.2316.010.html. [59] YAO S B,ZHU Q Y,ZHANG T,et al.Infrared image small-target detection based on improved FCOS and spatio-temporal features[J].Electronics,2022,11(6):933. [60] 陈永,王镇,卢晨涛,等.红外弱光下多特征与注意力增强铁路异物检测[J/OL].北京航空航天大学学报:1-15[2022-05-10].DOI:10.13700/j.bh.1001-5965.2021.0591. CHEN Y,WANG Z,LU C T,et al.Multi-feature and attention-enhanced railway foreign object detection under low infrared light[J/OL].Journal of Beijing University of Aeronautics and Astronautics:1-15[2022-05-10].DOI:10.13700/j.bh.1001-5965.2021.0591. [61] 张乃雪,钟羽中,赵涛,等.基于Smooth-DETR的产品表面小尺寸缺陷检测算法[J].计算机应用研究,2022,39(8):2520-2525. ZHANG N X,ZHONG Y Z,ZHAO T,et al.Detection method for small-size surface defects based on Smooth-DETR[J].Application Research of Computers,2022,39(8):2520-2525. [62] 高钦泉,黄炳城,刘文哲,等.基于改进CenterNet的竹条表面缺陷检测方法[J].计算机应用,2021,41(7):1933-1938. GAO Q Q,HUANG B C,LIU W Z,et al.Bamboo strip surface defect detection method based on improved CenterNet[J].Journal of Computer Applications,2021,41(7):1933-1938. [63] 何林远,白俊强,贺旭,等.基于稀疏Transformer的遥感旋转目标检测[J/OL].激光与光电子学进展:1-17[2022-05-10].http://kns.cnki.net/kcms/detail/31.1690.TN.20210927. 1006.002.html. HE L Y,BAI J Q,HE X,et al.Remote sensing rotating target detection based on sparse Transformer[J/OL].Progress in Laser and Optoelectronics:1-17[2022-05-10].http://kns.cnki.net/kcms/detail/31.1690.TN.20210927.1006. 002.html. [64] 韩磊,高永彬,史志才.基于稀疏Transformer的雷达点云三维目标检测[J/OL].计算机工程:1-10[2022-05-10].DOI:10.19678/j.issn.1000-3428.0062440. HAN L,GAO Y B,SHI Z C.3D target detection of radar point cloud based on sparse Transformer[J/OL].Computer Engineering:1-10[2022-05-10].DOI:10.19678/j.issn.1000-3428.0062440. [65] NAWAZ M,NAZIR T,MASOOD M,et al.Analysis of brain MRI images using improved CornerNet approach[J].Diagnostics,2021,11(10):1856. [66] 汤寓麟,李厚朴,张卫东,等.侧扫声纳检测沉船目标的轻量化DETR-YOLO法[J].系统工程与电子技术,2022,44(8):2427-2436. TANG Y L,LI H P,ZHANG W D,et al.Lightweight DETR-YOLO method for detecting shipwreck target in side-scan sonar[J].Systems Engineering and Electronics,2022,44(8):2427-2436. |
[1] | 王建波, 武友新. 改进YOLOv4-tiny的安全帽佩戴检测算法[J]. 计算机工程与应用, 2023, 59(4): 183-190. |
[2] | 李昂, 孙士杰, 张朝阳, 冯明涛, 吴成中, 李旺. 改进YOLOv5s的轨道障碍物检测模型轻量化研究[J]. 计算机工程与应用, 2023, 59(4): 197-207. |
[3] | 闫颢月, 王伟, 田泽. 复杂环境下基于改进YOLOv5的手势识别方法[J]. 计算机工程与应用, 2023, 59(4): 224-234. |
[4] | 张冬冬, 郭杰, 陈阳. 基于原始点云的三维目标检测算法[J]. 计算机工程与应用, 2023, 59(3): 209-217. |
[5] | 杨鹤, 柏正尧. CoT-TransUNet:轻量化的上下文Transformer医学图像分割网络[J]. 计算机工程与应用, 2023, 59(3): 218-225. |
[6] | 高玮军, 朱婧, 赵华洋, 李磊. 基于TRF-IM模型的个性化酒店评论摘要生成[J]. 计算机工程与应用, 2023, 59(2): 135-142. |
[7] | 王烨奎, 曹铁勇, 郑云飞, 方正, 王杨, 刘亚九, 付炳阳, 陈雷. 基于特征图关注区域的目标检测对抗攻击方法[J]. 计算机工程与应用, 2023, 59(2): 261-270. |
[8] | 李翔, 张涛, 张哲, 魏宏杨, 钱育蓉. Transformer在计算机视觉领域的研究综述[J]. 计算机工程与应用, 2023, 59(1): 1-14. |
[9] | 王一旭, 肖小玲, 王鹏飞, 向家富. 改进YOLOv5s的小目标烟雾火焰检测算法[J]. 计算机工程与应用, 2023, 59(1): 72-81. |
[10] | 胡昭华, 王莹. 改进YOLOv5的交通标志检测算法[J]. 计算机工程与应用, 2023, 59(1): 82-91. |
[11] | 何雨, 田军委, 张震, 王沁, 赵鹏. YOLOv5目标检测的轻量化研究[J]. 计算机工程与应用, 2023, 59(1): 92-99. |
[12] | 王鹏, 王玉林, 焦博文, 王洪昌, 于奕轩. 基于YOLOv5的道路目标检测算法研究[J]. 计算机工程与应用, 2023, 59(1): 117-125. |
[13] | 邓雪, 赵皓, 张静, 梅菠萍, 张华. 结合Cannikin’s Law的离线数据增广方法研究[J]. 计算机工程与应用, 2023, 59(1): 207-212. |
[14] | 胡章芳, 蹇芳, 唐珊珊, 明子平, 姜博文. DFSMN-T:结合强语言模型Transformer的中文语音识别[J]. 计算机工程与应用, 2022, 58(9): 187-194. |
[15] | 杨永波, 李栋. 改进YOLOv5的轻量级安全帽佩戴检测算法[J]. 计算机工程与应用, 2022, 58(9): 201-207. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||