Computer Engineering and Applications ›› 2023, Vol. 59 ›› Issue (1): 37-48.DOI: 10.3778/j.issn.1002-8331.2205-0354
• Research Hotspots and Reviews • Previous Articles Next Articles
FU Miaomiao, DENG Miaolei, ZHANG Dexian
Online:
2023-01-01
Published:
2023-01-01
付苗苗,邓淼磊,张德贤
FU Miaomiao, DENG Miaolei, ZHANG Dexian. Object Detection Algorithms Based on Deep Learning and Transformer[J]. Computer Engineering and Applications, 2023, 59(1): 37-48.
付苗苗, 邓淼磊, 张德贤. 基于深度学习和Transformer的目标检测算法[J]. 计算机工程与应用, 2023, 59(1): 37-48.
Add to citation manager EndNote|Ris|BibTeX
URL: http://cea.ceaj.org/EN/10.3778/j.issn.1002-8331.2205-0354
[1] KRIZHEVSKY A,SUTSKEVER I,HINTON G E.Imagenet classification with deep convolutional neural networks[C]//Advances in Neural Information Processing Systems,2012:1097-1105. [2] VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need[C]//Advances in Neural Information Processing Systems,2017. [3] DOSOVITSKIY A,BEYER L,KOLESNIKOV A,et al.An image is worth 16×16 words:Transformers for image recognition at scale[J].arXiv:2010.11929,2020. [4] 刘文婷,卢新明.基于计算机视觉的Transformer研究进展[J].计算机工程与应用,2022,58(6):1-16. LIU W T,LU X M.Research progress of Transformer based on computer vision[J].Computer Engineering and Applications,2022,58(6):1-16. [5] GIRSHICK R,DONAHUE J,DARRELL T,et al.Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2014:580-587. [6] HE K M,ZHANG X Y,REN S Q,et al.Spatial pyramid pooling in deep convolutional networks for visual recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2015,37(9):1904-1916. [7] GIRSHICK R.Fast R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision,2015:1440-1448. [8] REN S Q,HE K M,GIRSHICK R,et al.Faster R-CNN:Towards real-time object detection with region proposal networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,39(6):1137-1149. [9] DAI J,LI Y,HE K,et al.R-FCN:Object detection via region-based fully convolutional networks[C]//Advances in Neural Information Processing Systems,2016. [10] HE K,GKIOXARI G,DOLLáR P,et al.Mask R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision,2017:2961-2969. [11] CAI Z W,VASCONCELOS N.Cascade R-CNN:Delving into high quality object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2018:6154-6162. [12] LIU W,ANGUELOV D,ERHAN D,et al.SSD:Single shot multibox detector[C]//Proceedings of the European Conference on Computer Vision,2016:21-37. [13] FU C Y,LIU W,RANGA A,et al.DSSD:Deconvolutional single shot detector[J].arXiv:1701.06659,2017. [14] LI Z,ZHOU F.FSSD:Feature fusion single shot multibox detector[J].arXiv:1712.00960,2017. [15] JEONG J,PARK H,KWAK N.Enhancement of SSD by concatenating feature maps for object detection[J].arXiv:1705.09587,2017. [16] REDMON J,FARHADI A.YOLO9000:Better,faster,stronger[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2017:6517-6525. [17] REDMON J,FARHADI A.YOLOv3:An incremental improvement[J].arXiv:1804.02767,2018. [18] BOCHKOVSKIY A,WANG C Y,LIAO H Y M.YOLOv4:Optimal speed and accuracy of object detection[J].arXiv:2004.10934,2020. [19] LIN T Y,GOYAL P,GIRSHICK R,et al.Focal loss for dense object detection[C]//Proceedings of the IEEE International Conference on Computer Vision,2017:2999-3007. [20] HUANG L,YANG Y,DENG Y,et al.Densebox:Unifying landmark localization with end to end object detection[J].arXiv:1509.04874,2015. [21] REDMON J,DIVVALA S,GIRSHICK R,et al.You only look once:Unified,real-time object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2016:779-788. [22] NEWELL A,YANG K,DENG J.Stacked hourglass networks for human pose estimation[C]//Proceedings of the European Conference on Computer Vision,2016:483-499. [23] LAW H,DENG J.CornerNet:Detecting objects as paired keypoints[C]//Proceedings of the European Conference on Computer Vision,2018:734-750. [24] LAW H,TENG Y,RUSSAKOVSKY O,et al.Cornernet-lite:Efficient keypoint based object detection[J].arXiv:1904.08900,2019. [25] IANDOLA F N,HAN S,MOSKEWICZ M W,et al.SqueezeNet:AlexNet-level accuracy with 50x fewer parameters and <0.5?MB model size[J].arXiv:1602.07360,2016. [26] HOWARD A G,ZHU M,CHEN B,et al.MobileNets:Efficient convolutional neural networks for mobile vision applications[J].arXiv:1704.04861,2017. [27] DUAN K W,BAI S,XIE L X,et al.CenterNet:Keypoint triplets for object detection[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision,2019:6568-6577. [28] DONG Z W,LI G X,LIAO Y,et al.CentripetalNet:Pursuing high-quality keypoint pairs for object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2020:10516-10525. [29] ZHOU X Y,ZHUO J C,KR?HENBüHL P.Bottom-up object detection by grouping extreme and center points[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2019:850-859. [30] PAPADOPOULOS D P,UIJLINGS J R R,KELLER F,et al.Extreme clicking for efficient object annotation[C]//Proceedings of the IEEE International Conference on Computer Vision,2017:4940-4949. [31] TIAN Z,SHEN C H,CHEN H,et al.FCOS:Fully convolutional one-stage object detection[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision,2019:9626-9635. [32] LIN T Y,DOLLáR P,GIRSHICK R,et al.Feature pyramid networks for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2017:936-944. [33] KONG T,SUN F C,LIU H P,et al.FoveaBox:Beyound anchor-based object detection[J].IEEE Transactions on Image Processing,2020,29:7389-7398. [34] ZHANG S,CHI C,YAO Y,et al.Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2020:9759-9768. [35] 伏轩仪,张銮景,梁文科,等.锚点机制在目标检测领域的发展综述[J].计算机科学与探索,2022,16(4):791-805. FU X Y,ZHANG L J,LIANG W K,et al.Review on development of anchor mechanism in object detection[J].Journal of Frontiers of Computer Science and Technology,2022,16(4):791-805. [36] SUN P,JIANG Y,XIE E,et al.Onenet:Towards end-to-end one-stage object detection[J].arXiv:2012.05780,2020. [37] SUN P,ZHANG R,JIANG Y,et al.Sparse R-CNN:End-to-end object detection with learnable proposals[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2021:14454-14463. [38] ZHU C,HE Y,SAVVIDES M.Feature selective anchor-free module for single-shot object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2019:840-849. [39] CARION N,MASSA F,SYNNAEVE G,et al.End-to-end object detection with transformers[C]//Proceedings of the European Conference on Computer Vision,2020:213-229. [40] ZHU X,SU W,LU L,et al.Deformable DETR:Deformable transformers for end-to-end object detection[J].arXiv:2010.04159,2020. [41] DAI J F,QI H Z,XIONG Y W,et al.Deformable convolutional networks[C]//Proceedings of the IEEE International Conference on Computer Vision,2017:764-773. [42] SUN Z,CAO S,YANG Y,et al.Rethinking transformer-based set prediction for object detection[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision,2021:3611-3620. [43] ZHENG M,GAO P,ZHANG R,et al.End-to-end object detection with adaptive clustering transformer[J].arXiv:2011.09315,2020. [44] DAI Z G,CAI B L,LIN Y G,et al.UP-DETR:Unsupervised pre-training for object detection with transformers[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2021:1601-1610. [45] LIU S,QI L,QIN H,et al.Path aggregation network for instance segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2018:8759-8768. [46] TAN M,PANG R,LE Q V.Efficientdet:Scalable and efficient object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2020:10781-10790. [47] ZHANG D,ZHANG H,TANG J,et al.Feature pyramid transformer[C]//Proceedings of the European Conference on Computer Vision,2020:323-339. [48] LIU Z,LIN Y,CAO Y,et al.Swin transformer:Hierarchical vision transformer using shifted windows[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision,2021:10012-10022. [49] LIU Z,HU H,LIN Y,et al.Swin Transformer v2:Scaling up capacity and resolution[J].arXiv:2111.09883,2021. [50] WANG H,ZHU Y,ADAM H,et al.MaX-DeepLab:End-to-end panoptic segmentation with mask transformers[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2021:5463-5474. [51] WANG Y,XU Z,WANG X,et al.End-to-end video instance segmentation with transformers[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2021:8741-8750. [52] LIN M,LI C,BU X,et al.DETr for pedestrian detection[J].arXiv:2012.06785,2020. [53] LIU R J,YUAN Z J,LIU T,et al.End-to-end lane shape prediction with transformers[C]//Proceedings of the IEEE Winter Conference on Applications of Computer Vision,2021:3693-3701. [54] HUANG L,TAN J,LIU J,et al.Hand-transformer:Non-autoregressive structured modeling for 3D hand pose estimation[C]//Proceedings of the European Conference on Computer Vision,2020:17-33. [55] LIN K,WANG L,LIU Z.End-to-end human pose and mesh reconstruction with transformers[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2021:1954-1963. [56] CAO H,WANG Y,CHEN J,et al.Swin-UNet:UNet-like pure transformer for medical image segmentation[J].arXiv:2105.05537,2021. [57] GUO M H,CAI J X,LIU Z N,et al.PCT:Point cloud Transformer[J].Computational Visual Media,2021,7(2):187-199. [58] 奉志强,谢志军,包正伟,等.基于改进YOLOv5的无人机实时密集小目标检测算法[J/OL].航空学报:1-15[2022-05-10].http://kns.cnki.net/kcms/detail/11.1929.V.20220509. 2316.010.html. FENG Z Q,XIE Z J,BAO Z W,et al.UAV real-time dense small target detection algorithm based on improved YOLOv5[J/OL].Journal of Aeronautics and Astronautics:1-15[2022-05-10].http://kns.cnki.net/kcms/detail/11.1929.V.20220509.2316.010.html. [59] YAO S B,ZHU Q Y,ZHANG T,et al.Infrared image small-target detection based on improved FCOS and spatio-temporal features[J].Electronics,2022,11(6):933. [60] 陈永,王镇,卢晨涛,等.红外弱光下多特征与注意力增强铁路异物检测[J/OL].北京航空航天大学学报:1-15[2022-05-10].DOI:10.13700/j.bh.1001-5965.2021.0591. CHEN Y,WANG Z,LU C T,et al.Multi-feature and attention-enhanced railway foreign object detection under low infrared light[J/OL].Journal of Beijing University of Aeronautics and Astronautics:1-15[2022-05-10].DOI:10.13700/j.bh.1001-5965.2021.0591. [61] 张乃雪,钟羽中,赵涛,等.基于Smooth-DETR的产品表面小尺寸缺陷检测算法[J].计算机应用研究,2022,39(8):2520-2525. ZHANG N X,ZHONG Y Z,ZHAO T,et al.Detection method for small-size surface defects based on Smooth-DETR[J].Application Research of Computers,2022,39(8):2520-2525. [62] 高钦泉,黄炳城,刘文哲,等.基于改进CenterNet的竹条表面缺陷检测方法[J].计算机应用,2021,41(7):1933-1938. GAO Q Q,HUANG B C,LIU W Z,et al.Bamboo strip surface defect detection method based on improved CenterNet[J].Journal of Computer Applications,2021,41(7):1933-1938. [63] 何林远,白俊强,贺旭,等.基于稀疏Transformer的遥感旋转目标检测[J/OL].激光与光电子学进展:1-17[2022-05-10].http://kns.cnki.net/kcms/detail/31.1690.TN.20210927. 1006.002.html. HE L Y,BAI J Q,HE X,et al.Remote sensing rotating target detection based on sparse Transformer[J/OL].Progress in Laser and Optoelectronics:1-17[2022-05-10].http://kns.cnki.net/kcms/detail/31.1690.TN.20210927.1006. 002.html. [64] 韩磊,高永彬,史志才.基于稀疏Transformer的雷达点云三维目标检测[J/OL].计算机工程:1-10[2022-05-10].DOI:10.19678/j.issn.1000-3428.0062440. HAN L,GAO Y B,SHI Z C.3D target detection of radar point cloud based on sparse Transformer[J/OL].Computer Engineering:1-10[2022-05-10].DOI:10.19678/j.issn.1000-3428.0062440. [65] NAWAZ M,NAZIR T,MASOOD M,et al.Analysis of brain MRI images using improved CornerNet approach[J].Diagnostics,2021,11(10):1856. [66] 汤寓麟,李厚朴,张卫东,等.侧扫声纳检测沉船目标的轻量化DETR-YOLO法[J].系统工程与电子技术,2022,44(8):2427-2436. TANG Y L,LI H P,ZHANG W D,et al.Lightweight DETR-YOLO method for detecting shipwreck target in side-scan sonar[J].Systems Engineering and Electronics,2022,44(8):2427-2436. |
[1] | WANG Jianbo, WU Youxin. Safety Helmet Wearing Detection Algorithm of Improved YOLOv4-tiny [J]. Computer Engineering and Applications, 2023, 59(4): 183-190. |
[2] | LI Ang, SUN Shijie, ZHANG Zhaoyang, FENG Mingtao, WU Chengzhong, LI Wang. Research on Lightweight of Improved YOLOv5s Track Obstacle Detection Model [J]. Computer Engineering and Applications, 2023, 59(4): 197-207. |
[3] | YAN Haoyue, WANG Wei, TIAN Ze. Improved YOLOv5 Gesture Recognition Method in Complex Environments [J]. Computer Engineering and Applications, 2023, 59(4): 224-234. |
[4] | ZHANG Dongdong, GUO Jie, CHEN Yang. 3D Object Detection Algorithm Based on Raw Point Clouds [J]. Computer Engineering and Applications, 2023, 59(3): 209-217. |
[5] | YANG He, BAI Zhengyao. CoT-TransUNet:Lightweight Context Transformer Medical Image Segmentation Network [J]. Computer Engineering and Applications, 2023, 59(3): 218-225. |
[6] | JING Li, YAO Ke. Research on Text Classification Based on Knowledge Graph and Multimodal [J]. Computer Engineering and Applications, 2023, 59(2): 102-109. |
[7] | WANG Yurong, LIN Min, LI Yanling. BERT Mongolian Word Embedding Learning [J]. Computer Engineering and Applications, 2023, 59(2): 129-134. |
[8] | GAO Weijun, ZHU Jing, ZHAO Huayang, LI Lei. Personalized Product Review Summary Generation Based on TRF-IM Model [J]. Computer Engineering and Applications, 2023, 59(2): 135-142. |
[9] | WANG Yekui, CAO Tieyong, ZHENG Yunfei, FANG Zheng, WANG Yang , LIU Yajiu, FU Bingyang, CHEN Lei. Adversarial Attacks for Object Detection Based on Region of Interest of Feature Maps [J]. Computer Engineering and Applications, 2023, 59(2): 261-270. |
[10] | LI Xiang, ZHANG Tao, ZHANG Zhe, WEI Hongyang, QIAN Yurong. Survey of Transformer Research in Computer Vision [J]. Computer Engineering and Applications, 2023, 59(1): 1-14. |
[11] | WANG Yixu, XIAO Xiaoling, WANG Pengfei, XIANG Jiafu. Improved YOLOv5s Small Target Smoke and Fire Detection Algorithm [J]. Computer Engineering and Applications, 2023, 59(1): 72-81. |
[12] | WANG Peng, WANG Yulin, JIAO Bowen, WANG Hongchang, YU Yixuan. Research on Road Target Detection Algorithm Based on YOLOv5 [J]. Computer Engineering and Applications, 2023, 59(1): 117-125. |
[13] | DENG Xue, ZHAO Hao, ZHANG Jing, MEI Boping, ZHANG Hua. Research on Offline Data Augmentation Method Jointed with Cannikin’s Law [J]. Computer Engineering and Applications, 2023, 59(1): 207-212. |
[14] | HU Zhangfang, JIAN Fang, TANG Shanshan, MING Ziping, JIANG Bowen. DFSMN-T:Mandarin Speech Recognition with Language Model Transformer [J]. Computer Engineering and Applications, 2022, 58(9): 187-194. |
[15] | YANG Yongbo, LI Dong. Lightweight Helmet Wearing Detection Algorithm of Improved YOLOv5 [J]. Computer Engineering and Applications, 2022, 58(9): 201-207. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||