[1] REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 779-788.
[2] LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multibox detector[C]//Proceedings of 14th European Conference on Computer Vision, Amsterdam, The Netherlands, October 11-14, 2016. [S.l.]: Springer International Publishing, 2016: 21-37.
[3] JIANG H, LEARNED-MILLER E. Face detection with the faster R-CNN[C]//2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017), 2017: 650-657.
[4] CAI Z, VASCONCELOS N. Cascade R-CNN: high quality object detection and instance segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 43(5): 1483-1498.
[5] LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[C]//Proceedings of the IEEE International Conference on Computer Vision, 2017: 2980-2988.
[6] CARION N, MASSA F, SYNNAEVE G, et al. End-to-end object detection with transformers[C]//European Conference on Computer Vision. Cham: Springer International Publishing, 2020: 213-229.
[7] ZHU X, SU W, LU L, et al. Deformable detr: deformable transformers for end-to-end object detection[J]. arXiv:2010. 04159, 2020.
[8] CHEN S, SUN P, SONG Y, et al. Diffusiondet: diffusion model for object detection[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023: 19830-19843.
[9] WANG W, DAI J, CHEN Z, et al. Internimage: exploring large-scale vision foundation models with deformable convolutions[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023: 14408-14419.
[10] WANG J, CHEN K, XU R, et al. Carafe: content-aware reassembly of features[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019: 3007-3016.
[11] LI X, HU X, YANG J. Spatial group-wise enhance: improving semantic feature learning in convolutional networks[J]. arXiv:1905.09646, 2019.
[12] SONG J, MENG C, ERMON S. Denoising diffusion implicit models[J]. arXiv:2010.02502, 2020.
[13] ZHENG Z, WANG P, LIU W, et al. Distance-IoU loss: faster and better learning for bounding box regression[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2020: 12993-13000.
[14] ZHANG Y F, REN W, ZHANG Z, et al. Focal and efficient IOU loss for accurate bounding box regression[J]. Neurocomputing, 2022, 506: 146-157.
[15] 赵珊, 郑爱玲, 刘子路, 等. 通道分离双注意力机制的目标检测算法[J]. 计算机科学与探索, 2023, 17(5): 1112-1125.
ZHAO S, ZHENG A L, LIU Z L, et al. Object detection algorithm based on channel separation dual attention mechanism[J]. Journal of Frontiers of Computer Science and Technology, 2023, 17(5): 1112-1125.
[16] 贾天豪, 彭力, 戴菲菲. 引入残差学习与多尺度特征增强的目标检测器[J]. 计算机科学与探索, 2023, 17(5): 1102-1111.
JIA T H, PENG L, DAI F F. Object detector with residual learning and multi-scale feature enhancement[J]. Journal of Frontiers of Computer Science and Technology, 2023, 17(5): 1102-1111.
[17] 崔振东, 李宗民, 杨树林, 等. 基于语义分割引导的三维目标检测[J]. 图学学报, 2022, 43(6): 1134-1142.
CUI Z D, LI Z M, YANG S L, et al. 3D object detection based on semantic segmentation guidance[J]. Journal of Graphics, 2022, 43(6): 1134-1142.
[18] LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft coco: common objects in context[C]//Proceedings of 13th European Conference on Computer Visio, Zurich, Switzerland, September 6-12, 2014. [S.l.]: Springer International Publishing, 2014: 740-755.
[19] NIENABER S, KROON R S, BOOYSEN M J. A comparison of low-cost monocular vision techniques for pothole distance estimation[C]//IEEE Symposium on Computational Intelligence, 2016.
[20] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 770-778.
[21] LIU Z, LIN Y, CAO Y, et al. Swin transformer: hierarchical vision transformer using shifted windows[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021: 10012-10022.
[22] LIU S, QI L, QIN H, et al. Path aggregation network for instance segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 8759-8768.
[23] LIN T Y, DOLLáR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 2117-2125.
[24] SUN K, ZHAO Y, JIANG B, et al. High-resolution representations for labeling pixels and regions[J]. arXiv:1904.04514, 2019.
[25] 马赛, 葛海波, 何文昊, 等. 轻量高效的自底向上人体姿态估计算法研究[J/OL]. 计算机工程与应用: 1-22[2023-09-20]. http://kns.cnki.net/kcms/detail/11.2127.TP.20230814. 1802.022.html.
MA S, GE H B, HE W H, et al. Research on lightweight and efficient bottom-up human pose estimation algorithm[J/OL]. Computer Engineering and Applications: 1-22 [2023-09-20]. http://kns.cnki.net/kcms/detail/11.2127.TP. 20230814. 1802.022.html.
[26] MISRA D, NALAMADA T, ARASANIPALAI A U, et al. Rotate to attend: convolutional triplet attention module[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2021: 3139-3148.
[27] YANG L, ZHANG R Y, LI L, et al. Simam: a simple, parameter-free attention module for convolutional neural networks[C]//International Conference on Machine Learning, 2021: 11863-11874.
[28] WANG Q, WU B, ZHU P, et al. ECA-Net: efficient channel attention for deep convolutional neural networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 11534-11542.
[29] GEVORGYAN Z. SIoU loss: more powerful learning for bounding box regression[J]. arXiv:2205.12740, 2022.
[30] TAN M, PANG R, LE Q V. Efficientdet: scalable and efficient object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 10781-10790.
[31] ZHOU X, WANG D, KR?HENBüHL P. Objects as points[J]. arXiv:1904.07850, 2019.
[32] SUN P, ZHANG R, JIANG Y, et al. Sparse R-CNN: end-to-end object detection with learnable proposals[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 14454-14463.
[33] ZHANG H, LI F, LIU S, et al. Dino: detr with improved denoising anchor boxes for end-to-end object detection[J]. arXiv:2203.03605, 2022. |