[1] GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014: 580-587.
[2] GIRSHICK R. Fast R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision, 2015: 1440-1448.
[3] REN S, HE K, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 39: 1137-1149.
[4] HE K, GKIOXARI G, DOLLáR P, et al. Mask R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision, 2017: 2961-2969.
[5] LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot MultiBox detector[C]//Proceedings of the European Conference on Computer Vision, 2016: 21-37.
[6] BOCHKOVSKIY A, WANG C Y, LIAO H Y M. YOLOv4: optimal speed and accuracy of object detection[J]. arXiv:2004.10934, 2020.
[7] JOCHER G. YOLOv5 by Ultralytics (v6.1)[EB/OL]. (2022-02-22)[2023-01-02]. https://github.com/ultralytics/yolov5.
[8] REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 779-788.
[9] REDMON J, FARHADI A. YOLO9000: better, faster, stronger[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 7263-7271.
[10] REDMON J, FARHADI A. YOLOv3: an incremental improvement[J]. arXiv:1804.02767, 2018.
[11] DING X, ZHANG X, ZHOU Y, et al. Scaling up your kernels to 31×31: revisiting large kernel design in CNN-s[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 11963-11975.
[12] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16×16 words: transformers for image recognition at scale[J]. arXiv:2010.11929, 2020.
[13] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Proceedings of the Neural Information Processing Systems, 2017: 5998-6008.
[14] BA J L, KIROS J R, HINTON G E. Layer normalization[J]. arXiv:1607.06450, 2016.
[15] HENDRYCKS D, GIMPEL K. Gaussian error linear units (GELUs)[J]. arXiv:1606.08415, 2016.
[16] LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft COCO: common objects in context[C]//Proceedings of the European Conference on Computer Vision, 2014: 740-755.
[17] 徐光达, 毛国君. 多层级特征融合的无人机航拍图像目标检测[J]. 计算机科学与探索, 2023, 17(3): 635-645.
XU G D, MAO G J. Aerial image object detection of UAV based on multi-level feature fusion[J]. Journal of Frontiers of Computer Science and Technology, 2023, 17(3): 635-645.
[18] 田卓钰, 马苗, 杨楷芳. 基于级联注意力与点监督机制的考场目标检测模型[J]. 软件学报, 2022, 33(7): 2633-2645.
TIAN Z Y, MA M, YANG K F. Object detection model for examination classroom based on cascade attention and point supervision mechanism[J]. Journal of Software, 2022, 33(7): 2633-2645.
[19] 王剑哲, 吴秦. 坐标注意力特征金字塔的显著性目标检测算法[J]. 计算机科学与探索, 2023, 17(1): 154-165.
WANG J Z, WU Q. Salient object detection based on coordinate attention feature pyramid[J]. Journal of Frontiers of Computer Science and Technology, 2023, 17(1): 154-165.
[20] Papers with Code. COCO test-dev Benchmark[EB/OL]. (2022-06-22)[2023-01-02]. https://paperswithcode.com/sota/object-det-ection-on-coco.
[21] CUI Z, LI K, GU L, et al. You only need 90K parameters to adapt light: a light weight transformer for image enhancement and exposure correction[C]//Proceedings of the British Machine Vision Conference, 2022.
[22] CARION N, MASSA F, SYNNAEVE G, et al. End-to-end object detection with transformers[C]//Proceedings of the European Conference on Computer Vision, 2020: 213-229.
[23] JIANG Q, MAO Y, CONG R, et al. Unsupervised decomposition and correction network for low-light image enhancement[J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(10): 19440-19455.
[24] LIU R, MA L, MA T, et al. Learning with nested scene modeling and cooperative architecture search for low-light vision[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 45(5): 1-17.
[25] LIU W, REN G, YU R, et al. Image-adaptive YOLO for object detection in adverse weather conditions[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2022: 1792-1800.
[26] HONG Y, WEI K, CHEN L, et al. Crafting object detection in very low light[C]//Proceedings of the British Machine Vision Conference, 2021.
[27] CUI Z, QI G J, GU L, et al. Multitask AET with orthogonal tangent regularity for dark object detection[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021: 2553-2562.
[28] ZHANG H, HAO K, PEDRYCZ W, et al. Vision transformer with convolutions architecture search[J]. arXiv:2203.10435, 2022.
[29] LIU Z, LIN Y, CAO Y, et al. Swin Transformer: hierarchical vision transformer using shifted windows[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021: 10012-10022.
[30] WANG C Y, LIAO H Y M, WU Y H, et al. CSPNet: a new backbone that can enhance learning capability of CNN[C]//Proceedings of the?IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020: 390-391.
[31] PENG Z, HUANG W, GU S, et al. Conformer: local features coupling global representations for visual recognition[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021: 367-376.
[32] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 770-778.
[33] CHEN Q, WU Q, WANG J, et al. MixFormer: mixing features across windows and dimensions[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 5249-5259.
[34] IOFFE S, SZEGEDY C. Batch normalization: accelerating deep network training by reducing internal covariate shift[C]//Proceedings of the International Conference on Machine Learning, 2015: 448-456.
[35] ELFWING S, UCHIBE E, DOYA K. Sigmoid-weighted linear units for neural network function approximation in reinforcement learning[J]. Neural Networks, 2018, 107: 3-11.
[36] LOH Y P, CHAN C S. Getting to know low-light images with the exclusively dark dataset[J]. Computer Vision and Image Understanding, 2019, 178: 30-42.
[37] ZHANG H, CHANG H, MA B, et al. Dynamic R-CNN: towards high quality object detection via dynamic training[C]//Proceedings of the European Conference on Computer Vision, 2020: 260-275.
[38] CAI Z, VASCONCELOS N. Cascade R-CNN: delving into high quality object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 6154-6162.
[39] RADOSAVOVIC I, KOSARAJU R P, GIRSHICK R, et al. Designing network design spaces[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 10428-10436.
[40] SUN K, XIAO B, LIU D, et al. Deep high-resolution representation learning for human pose estimation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 5693-5703.
[41] TAN M, LE Q. EfficientNet: rethinking model scaling for convolutional neural networks[C]//Proceedings of the International Conference on Machine Learning, 2019: 6105-6114.
[42] LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[C]//Proceedings of the IEEE International Conference on Computer Vision, 2017: 2980-2988.
[43] CHEN K, WANG J, PANG J, et al. MMDetection: open MMLab detection toolbox and benchmark[J]. arXiv:1906.
07155, 2019.
[44] LIN T Y, DOLLáR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 2117-2125.
[45] LIU S, QI L, QIN H, et al. Path aggregation network for instance segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 8759-8768. |