[1] GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014: 580-587.
[2] GIRSHICK R. Fast R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision, 2015: 1440-1448.
[3] REN S, HE K, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[C]//Advances in Neural Information Processing Systems, 2015: 91-99.
[4] HE K, GKIOXARI G, DOLLAR P, et al. Mask R-CNN[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42(2): 386-397.
[5] HE K M, ZHANG X, REN S, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9): 1904-1916.
[6] REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 779-788.
[7] REDMON J, FARHADI A. YOLO9000: better, faster, stronger[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 7263-7271.
[8] REDMON J, FARHADI A. Yolov3: an incremental improvement[J]. arXiv:1804.02767, 2018.
[9] WANG C, BOCHKOVSKIY A, LIAO H. Scaled-YOLOv4: scaling cross stage partial network[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 13024-13033.
[10] LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multibox detector[C]//European Conference on Computer Vision. Cham: Springer, 2016: 21-37.
[11] 徐守坤, 王雅如, 顾玉宛, 等. 基于改进Faster RCNN的安全帽佩戴检测研究[J]. 计算机应用研究, 2020, 37(3): 901-905.
XU S K, WANG Y R, GU Y W, et al. Safey helmet wearing detection study based on improved Faster RCNN[J]. Application Research of Computers, 2020, 37(3): 901-905.
[12] 王玲敏, 段军, 辛立伟. 引入注意力机制的YOLOv5 安全帽佩戴检测方法[J]. 计算机工程与应用, 2022, 58(9): 303-312.
WANG L M, DUAN J, XIN L W, YOLOv5 helmet wear detection method with introduction of attention mechanism[J]. Computer Engineering and Applications, 2022, 58(9): 303-312.
[13] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Proceedings of the 31st Conference on Neural Information Processing Systems. Washington DC, USA: IEEE Press, 2017: 5998-6010.
[14] 刘文婷, 卢新明. 基于计算机视觉的 Transformer 研究进展[J]. 计算机工程与应用, 2022, 58(6): 1-16.
LIU W T, LU X M. Research progress of Transformer based on computer vision[J]. Computer Engineering and Applications, 2022, 58(6): 1-16.
[15] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16×16 words: Transformers for image recognition at scale[C]//Proceedings of International Conference on Learning Representations. Washington DC, USA:[s.n.], 2020: 1-9.
[16] CARION N, MASSA F, SYNNAEVE G, et al. End-to-end object detection with Transformers[C]//Proceedings of European Conference on Computer Vision. Berlin, Germany: Springer, 2020: 213-229.
[17] 江英杰, 宋晓宁. 基于视觉 Transformer 的双流目标跟踪算法[J]. 计算机工程与应用, 2022, 58(12): 183-190.
JIANG Y J, SONG X N. Dual-stream cbject tracking algorithm based on visual Transformer[J]. Computer Engineering and Applications, 2022, 58(12): 183-190.
[18] SRINIVAS A, LIN T Y, PARMAR N, et al. Bottleneck transformers for visual recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 16519-16529.
[19] LECUN Y, BOTTOU L, BENGIO Y, et al. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86(11): 2278-2324.
[20] LECUN Y, BOSER B, DENKER J S, et al. Backpropagation applied to handwritten zip code recognition[J]. Neural Computation, 1989, 11(4): 541-551.
[21] HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//Proceedings of Conference on Computer Vision and Pattern Recognition. Washington DC, USA: IEEE Press, 2016: 770-778.
[22] DAI Z, LIU H, LE Q V, et al. Coatnet: marrying convolution and attention for all data sizes[C]//Advances in Neural Information Processing Systems, 2021: 3965-3977.
[23] HOU Q, ZHOU D, FENG J. Coordinate attention for efficient mobile network design[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 13713-13722.
[24] DENG J, DONG W, SOCHER R, et al. ImageNet: a large-scale hierarchical image database[C]//Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009: 248-255. |