
计算机工程与应用 ›› 2025, Vol. 61 ›› Issue (6): 1-21.DOI: 10.3778/j.issn.1002-8331.2407-0501
黄德启,黄海峰,黄德意,刘振航
出版日期:2025-03-15
发布日期:2025-03-14
HUANG Deqi, HUANG Haifeng, HUANG Deyi, LIU Zhenhang
Online:2025-03-15
Published:2025-03-14
摘要: 自动驾驶感知模块中作为采集输入的传感器种类不断发展,要使多模态数据统一地表征出来变得愈加困难。BEV感知学习在自动驾驶感知任务模块中可以使多模态数据统一融合到一个特征空间,相比于其他感知学习模型拥有更好的发展潜力。从研究意义、空间部署、准备工作、算法发展及评价指标五个方面总结了BEV感知模型具有良好发展潜力的原因。BEV感知模型从框架角度概括为四个系列:Lift-Splat-Lss系列、IPM逆透视转换、MLP视图转换及Transformer视图转换;从输入数据概括为两类:第一类是纯图像特征的输入包括单目摄像头输入和多摄像头输入,第二类在融合数据输入中不仅是简单的点云数据和图像特征的数据融合,还包括了以点云数据为引导或监督的知识蒸馏融合和以引导切片方式去划分高度段的融合。概述了多目标追踪、地图分割、车道线检测及3D目标检测四种自动驾驶任务在BEV感知模型当中的应用,并总结了目前BEV感知学习四个系列框架的缺点。
黄德启, 黄海峰, 黄德意, 刘振航. BEV感知学习在自动驾驶中的应用综述[J]. 计算机工程与应用, 2025, 61(6): 1-21.
HUANG Deqi, HUANG Haifeng, HUANG Deyi, LIU Zhenhang. Review of Application of BEV Perceptual Learning in Autonomous Driving[J]. Computer Engineering and Applications, 2025, 61(6): 1-21.
| [1] 胡笳, 罗书源, 赖金涛, 等. 自动驾驶对交通运输系统规划的影响综述[J]. 交通运输系统工程与信息, 2021, 21(5): 52-65. HU J, LUO S Y, LAI J T, et al. A review of the impact of autonomous driving on transportation planning[J]. Journal of Transportation Systems Engineering and Information Technology, 2021, 21(5): 52-65. [2] MUR-ARTAL R, MONTIEL J M M, TARDóS J D. ORB-SLAM: a versatile and accurate monocular SLAM system[J]. IEEE Transactions on Robotics, 2015, 31(5): 1147-1163. [3] YU C, LIU Z X, LIU X J, et al. DS-SLAM: a semantic visual SLAM towards dynamic environments[C]//Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems. Piscataway: IEEE, 2018: 1168-1174. [4] WANG H Y, WANG J W, AGAPITO L. Co-SLAM: joint coordinate and sparse parametric encodings for neural real-time SLAM[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 13293-13302. [5] TIAN L Y, YAN Y B, LI H R. SVD?SLAM: stereo visual SLAM algorithm based on dynamic feature filtering for autonomous driving[J]. Electronics, 2023, 12(8): 1883. [6] CHANG Y M, HU J, XU S Y. OTE-SLAM: an object tracking enhanced visual SLAM system for dynamic environments[J]. Sensors, 2023, 23(18): 7921. [7] WANG D, WANG J H, TIAN Y H, et al. PAL-SLAM: a feature-based SLAM system for a panoramic annular lens[J]. Optics Express, 2022, 30(2): 1099-1113. [8] DICKMANNS E D. Developing the sense of vision for autonomous road vehicles at UniBwM[J]. Computer, 2017, 50(12): 24-31. [9] REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149. [10] HE K M, GKIOXARI G, DOLLáR P, et al. Mask R-CNN[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 2980-2988. [11] PAN S Y, WANG X M. A survey on perspective-N-point problem[C]//Proceedings of the 2021 40th Chinese Control Conference. Piscataway: IEEE, 2021: 2396-2401. [12] CHEN H S, TIAN W, WANG P C, et al. EPro-PnP: generalized end-to-end probabilistic perspective-N-points for monocular object pose estimation[J]. arXiv:2203.13254, 2022. [13] WANG T, PANG J M, LIN D H. Monocular 3D object detection with depth from motion[C]//Proceedings of the European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2022: 386-403. [14] LIU T B, DU S S, LIANG C C, et al. A novel multi-sensor fusion based object detection and recognition algorithm for intelligent assisted driving[J]. IEEE Access, 2021, 9: 81564-81574. [15] PENG J, ZHANG P, ZHENG L X, et al. UAV positioning based on multi-sensor fusion[J]. IEEE Access, 2020, 8: 34455-34467. [16] CAESAR H, BANKITI V, LANG A H, et al. NuScenes: a multimodal dataset for autonomous driving[C]//Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 11618-11628. [17] GEIGER A, LENZ P, URTASUN R. Are we ready for autonomous driving? the KITTI vision benchmark suite[C]//Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2012: 3354-3361. [18] WILSON B, QI W, AGARWAL T, et al. Argoverse 2: next generation datasets for self-driving perception and forecasting[J]. arXiv:2301.00493, 2023. [19] SUN P, KRETZSCHMAR H, DOTIWALLA X, et al. Scalability in perception for autonomous driving: Waymo open dataset[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 2443-2451. [20] LIU M Y, YURTSEVER E, ZHOU X, et al. A survey on autonomous driving datasets: data statistic, annotation, and outlook[J]. arXiv:2401.01454, 2024. [21] CORDTS M, OMRAN M, RAMOS S, et al. The Cityscapes dataset for semantic urban scene understanding[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 3213-3223. [22] PITROPOV M, GARCIA D E, REBELLO J, et al. Canadian adverse driving conditions dataset[J]. The International Journal of Robotics Research, 2021, 40(4/5): 681-690. [23] MADDERN W, PASCOE G, LINEGAR C, et al. 1 year, 1?000 km: the Oxford RobotCar dataset[J]. The International Journal of Robotics Research, 2017, 36(1): 3-15. [24] YU F, CHEN H F, WANG X, et al. BDD100K: a diverse driving dataset for heterogeneous multitask learning[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 2633-2642. [25] GEYER J, KASSAHUN Y, MAHMUDI M, et al. A2D2: audi autonomous driving dataset[J]. arXiv:2004.06320, 2020. [26] CHANG M F, LAMBERT J, SANGKLOY P, et al. Argoverse: 3D tracking and forecasting with rich maps[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 8740-8749. [27] JEONG J, CHO Y, SHIN Y S, et al. Complex urban LiDAR data set[C]//Proceedings of the 2018 IEEE International Conference on Robotics and Automation. Piscataway: IEEE, 2018: 6344-6351. [28] HUANG X Y, CHENG X J, GENG Q C, et al. The ApolloScape dataset for autonomous driving[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Piscataway: IEEE, 2018: 1067-10676. [29] ROS G, SELLART L, MATERZYNSKA J, et al. The SYNTHIA dataset: a large collection of synthetic images for semantic segmentation of urban scenes[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 3234-3243. [30] RAMANISHKA V, CHEN Y T, MISU T, et al. Toward driving scene understanding: a dataset for learning driver behavior and causal reasoning[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 7699-7707. [31] PATIL A, MALLA S, GANG H M, et al. The H3D dataset for full-surround 3D multi-object detection and tracking in crowded urban scenes[C]//Proceedings of the 2019 International Conference on Robotics and Automation. Piscataway: IEEE, 2019: 9552-9557. [32] NARAYANAN A, DWIVEDI I, DARIUSH B. Dynamic traffic scene classification with space-time coherence[C]//Proceedings of the 2019 International Conference on Robotics and Automation. Piscataway: IEEE, 2019: 5629-5635. [33] YAO Y, XU M Z, CHOI C, et al. Egocentric vision-based future vehicle localization for intelligent driving assistance systems[C]//Proceedings of the 2019 International Conference on Robotics and Automation. Piscataway: IEEE, 2019: 9711-9717. [34] KIM J, MISU T, CHEN Y T, et al. Grounding human-to-vehicle advice for self-driving vehicles[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 10583-10591. [35] MALLA S, DARIUSH B, CHOI C. TITAN: future forecast using action priors[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 11183-11193. [36] CHE Z P, LI G Y, L T, et al. D2-city: a large-scale dashcam video dataset of diverse traffic scenarios[J]. arXiv:1904. 01975, 2019. [37] PAN X G, SHI J P, LUO P, et al. Spatial as deep: spatial CNN for traffic scene understanding[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2018: 7276-7283. [38] CHEN Y P, WANG J K, LI J, et al. LiDAR-video driving dataset: learning driving policies effectively[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 5870-5878. [39] CHOI Y, KIM N, HWANG S, et al. KAIST multispectral day/night data set for autonomous and assisted driving[C]//Proceedings of the IEEE Transactions on Intelligent Transportation Systems. Piscataway: IEEE, 2018: 934-948. [40] PALAZZI A, ABATI D, CALDERARA S, et al. Predicting the driver’s focus of attention: the DR(eye)VE project[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(7): 1720-1733. [41] SANTANA E, HOTZ G. Learning a driving simulator[J]. arXiv:1608.01230, 2016. [42] DOLLAR P, WOJEK C, SCHIELE B, et al. Pedestrian detection: a benchmark[C]//Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2009: 304-311. [43] BROSTOW G J, SHOTTON J, FAUQUEUR J, et al. Segmentation and recognition using structure from motion point clouds[C]//Proceedings of the European Conference on Computer Vision. Berlin, Heidelberg: Springer, 2008: 44-57. [44] ZHANG L H, LI Y W, ZHOU X Y, et al. Transcending the limit of local window: advanced super-resolution transformer with adaptive token dictionary[C]//Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2024: 2856-2865. [45] TIAN Y C, CHEN H T, XU C, et al. Image processing GNN: breaking rigidity in super-resolution[C]//Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2024: 24108-24117. [46] ESMAEILZEHI A, NOOSHI F, ZAREDAR H, et al. DHBSR: a deep hybrid representation-based network for blind image super resolution[J]. Computer Vision and Image Understanding, 2024, 246: 104034. [47] KARTHICK S, MUTHUKUMARAN N. Deep RegNet-150 architecture for single image super resolution of real-time unpaired image data[J]. Applied Soft Computing, 2024, 162: 111837. [48] KORKMAZ C, TEKALP A M, DOGAN Z. Training generative image super?resolution models by wavelet?domain losses enables better control of artifacts[C]//Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2024: 5926-5936. [49] ZHENG Q P, ZHENG L, GUO Y F, et al. Self-adaptive reality-guided diffusion for artifact-free super-resolution[C]//Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2024: 25806-25816. [50] SUN H Z, LI W B, LIU J Z, et al. CoSeR: bridging image and language for cognitive super-resolution[C]//Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2024: 25868-25878. [51] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Advances in Neural Information Processing Systems, 2017. [52] SUBAKAN C, RAVANELLI M, CORNELL S, et al. Attention is all you need in speech separation[C]//Proceedings of the ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE, 2021: 21-25. [53] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is wo rth 16×16 words: Transformers for image recognition at scale[J]. arXiv:2010.11929, 2020. [54] HAN K, WANG Y H, CHEN H T, et al. A survey on vision transformer[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(1): 87-110. [55] RADFORD A, KIM J W, HALLACY C, et al. Learning transferable visual models from natural language supervision[C]//Proceedings of the International Conference on Machine Learning, 2021: 8748-8763. [56] LI H Y, SIMA C H, DAI J F, et al. Delving into the devils of bird’s-eye-view perception: a review, evaluation and recipe[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024, 46(4): 2151-2170. [57] PHILION J, FIDLER S. Lift, splat, shoot: encoding images from arbitrary camera rigs by implicitly unprojecting to 3D[C]//Proceedings of the European Conference on Computer Vision. Cham: Springer International Publishing, 2020: 194-210. [58] LANG A H, VORA S, CAESAR H, et al. PointPillars: fast encoders for object detection from point clouds[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 12689-12697. [59] NG M H, RADIA K, CHEN J, et al. BEV-Seg: bird’s eye view semantic segmentation using geometry and semantic point cloud[J]. arXiv:2006.11436, 2020. 、 [60] HUANG J J, HUANG G, ZHU Z, et al. BEVDet: high-performance multi-camera 3D object detection in bird-eye-view[J]. arXiv:2112.11790, 2021. [61] HUANG J J, HUANG G. BEVDet4D: exploit temporal cues in multi-camera 3D object detection[J]. arXiv:2203. 17054, 2022. [62] XIE E Z, YU Z D, ZHOU D Q, et al. M2BEV: multi-camera joint 3D detection and segmentation with unified birds-eye view representation[J]. arXiv:2204.05088, 2022. [63] WANG B L, ZHENG H W, ZHANG L, et al. BEVRefiner: improving 3D object detection in bird’s-eye-view via dual refinement[J]. IEEE Transactions on Intelligent Transportation Systems, 2024, 25(10): 15094-15105. [64] LIU Z J, TANG H T, AMINI A, et al. BEVFusion: multi-task multi-sensor fusion with unified bird’s-eye view representation[C]//Proceedings of the 2023 IEEE International Conference on Robotics and Automation. Piscataway: IEEE, 2023: 2774-2781. [65] LI Y H, GE Z, YU G Y, et al. BEVDepth: acquisition of reliable depth for multi-view 3D object detection[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2023: 1477-1485. [66] REIHER L, LAMPE B, ECKSTEIN L. A Sim2Real deep learning approach for the transformation of images from multiple vehicle-mounted cameras to a semantically segmented image in bird’s eye view[C]//Proceedings of the 2020 IEEE 23rd International Conference on Intelligent Transportation Systems. Piscataway: IEEE, 2020: 1-7. [67] LIU Y C, YUAN T Y, WANG Y, et al. VectorMapNet: end-to-end vectorized HD map learning[C]//Proceedings of the International Conference on Machine Learning, 2023: 22352-22369. [68] NIU H C, LIU P L, JI X W, et al. BEVGM: a visual place recognition method with bird’s eye view graph matching[J]. IEEE Robotics and Automation Letters, 2024, 9(6): 5142-5149. [69] RODDICK T, CIPOLLA R. Predicting semantic map representations from images using pyramid occupancy networks[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 11135-11144. [70] LI X X, WANG X Y, ZHU R, et al. Selectively augmented attention network for few?shot image classification[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2025, 35(2): 1180-1192. [71] BELLO I, ZOPH B, LE Q, et al. Attention augmented convolutional networks[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 3285-3294. [72] CORDONNIER J B, LOUKAS A, JAGGI M. On the relationship between self-attention and convolutional layers[J]. arXiv:1911.03584, 2019. [73] WNAG Y, GUIZILINI V C, ZHANG T Y, et al. DETR3D: 3D object detection from multi-view images via 3D-to-2D queries[C]//Proceedings of the Conference on Robot Learning, 2022: 180-191. [74] YANG W X, LI Q, LIU W X, et al. Projecting your view attentively: monocular road scene layout estimation via cross-view transformation[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 15531-15540. [75] CAN Y B, LINIGER A, PAUDEL D P, et al. Structured bird’s-eye-view traffic scene understanding from onboard images[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021: 15641-15650. [76] ZHOU B, KR?HENBüHL P. Cross-view transformers for real-time map-view semantic segmentation[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 13750-13759. [77] LIU Y F, WANG T C, ZHANG X Y, et al. PETR: position embedding transformation forMulti-view 3D object detection[C]//Proceedings of the European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2022: 531-548. [78] LIU Y F, YAN J J, JIA F, et al. PETRv2: a unified framework for 3D perception from multi-camera images[C]//Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2023: 3239-3249. [79] SAHA A, MENDEZ O, RUSSELL C, et al. Translating images into maps[C]//Proceedings of the 2022 International Conference on Robotics and Automation. Piscataway: IEEE, 2022: 9200-9206. [80] LI Z Q, WANG W H, LI H Y, et al. BEVFormer: learning bird’s-eye-view representation fromMulti-camera images viaSpatiotemporal transformers[C]//Proceedings of the European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2022: 1-18. [81] YANG C Y, CHEN Y T, TIAN H, et al. BEVFormer v2: adapting modern image backbones to bird’s-eye-view recognition via perspective supervision[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 17830-17839. [82] CHEN L, SIMA C H, LI Y, et al. PersFormer: 3D lane detection via perspective transformer and the OpenLane benchmark[C]//Proceedings of the European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2022: 550-567. [83] WANG R H, QIN J, LI K Y, et al. BEV-LaneDet: a simple and effective 3D lane detection baseline[J]. arXiv:2210. 06006, 2022. [84] WANG Z Y, LI D W, LUO C X, et al. DistillBEV: boosting multi-camera 3D object detection with cross-modal knowledge distillation[C]//Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2023: 8603-8612. [85] WEI D F, GAO T, JIA Z Y, et al. BEV-CLIP: multi-modal bev retrieval methodology for complex scene in autonomous driving[J]. arXiv:2401.01065, 2024. [86] ZHANG Y P, ZHU Z P, ZHENG W Z, et al. BEVerse: unified perception and prediction in birds-eye-view for vision-centric autonomous driving[J]. arXiv:2205.09743, 2022. [87] ZHAO J X, MU F Z, LYU Y. CollaborativeBEV: collaborative bird eye view for reconstructing crowded environment[J]. Image and Vision Computing, 2024, 147: 105060. [88] FENG Y C, SUN Y X. PolarPoint-BEV: bird-eye-view perception in polar points for explainable end-to-end autonomous driving[J]. IEEE Transactions on Intelligent Vehicles, 2024(99): 1-11. [89] JIANG K, HUANG J X, XIE W Y, et al. DA-BEV: unsupervised domain adaptation for bird’s eye view perception[J]. arXiv:2401.08687, 2024. [90] HUANG J, HUANG G. BEVpoolv2: a cutting-edge implementation of bevdet toward deployment[J]. arXiv:2211. 17111, 2022. [91] CAN Y B, LINIGER A, UNAL O, et al. Understanding bird’s-eye view of road semantics using an onboard camera[J]. IEEE Robotics and Automation Letters, 2022, 7(2): 3302-3309. [92] NATAN O, MIURA J. DeepIPC: deeply integrated perception and control for an autonomous vehicle in real environments[J]. IEEE Access, 2024, 12: 49590-49601. [93] HUANG B, LI Y, XIE E, et al. Fast-BEV: towards real-time on-vehicle bird’s-eye view perception[J]. arXiv:2301.07870, 2023. [94] CHEN S, CHENG T, WANG X, et al. Efficient and robust 2D-to-BEV representation learning via geometry-guided kernel transformer[J]. arXiv:2206.04584, 2022. [95] ZONG Z F, JIANG D Z, SONG G L, et al. Temporal enhanced training of multi?view 3D object detector via historical object prediction[C]//Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2023: 3758-3767. [96] BARTOCCIONI F, ZABLOCKI é, BURSUC A, et al. LaRa: latents and rays for multi-camera bird’s-eye-view semantic segmentation[C]//Proceedings of the Conference on Robot Learning, 2023: 1663-1672. [97] LIU Z, CHEN S, GUO X, et al. Vision-based uneven BEV representation learning with polar rasterization and surface estimation[C]//Proceedings of the Conference on Robot Learning, 2023: 437-446. [98] XIE S, KONG L, ZHANG W, et al. RoboBEV: towards robust bird’s eye view perception under corruptions[J]. arXiv:2304.06719, 2023. [99] QI Z Y, WANG J Q, WU X Y, et al. OCBEV: object-centric BEV transformer for multi-view 3D object detection[C]//Proceedings of the 2024 International Conference on 3D Vision. Piscataway: IEEE, 2024: 1188-1197. [100] GE C, CHEN J, XIE E, et al. MetaBEV: solving sensor failures for bev detection and map segmentation[J]. arXiv:2304.09801, 2023. [101] CAI H X, ZHANG Z Y, ZHOU Z Y, et al. BEVFusion4D: learning LiDAR-camera fusion under bird’s-eye-view via cross-modality guidance and temporal aggregation[J]. arXiv:2303.17099, 2023. [102] LI J N, LU M, LIU J M, et al. BEV-LGKD: a unified LiDAR-guided knowledge distillation framework for multi-view BEV 3D object detection[J]. IEEE Transactions on Intelligent Vehicles, 2024, 9(1): 2489-2498. [103] ZHOU S, LIU W, HU C, et al. UniDistill: a universal cross-modality knowledge distillation framework for 3D object detection in bird’s-eye view[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 5116-5125. [104] HUANG P, LIU L, ZHANG R, et al. TiG-BEV: multi-view BEV 3D object detection via target inner-geometry learning[J]. arXiv:2212.13979, 2022. [105] ZHAO H M, ZHANG Q M, ZHAO S S, et al. SimDistill: simulated multi-modal distillation for BEV 3D object detection[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2024: 7460-7468. [106] CHI X W, LIU J M, LU M, et al. BEV-SAN: accurate BEV 3D object detection via slice attention networks[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 17461-17470. [107] LI Q, WANG Y, WANG Y L, et al. HDMapNet: an online HD map construction and evaluation framework[C]//Proceedings of the 2022 International Conference on Robotics and Automation. Piscataway: IEEE, 2022: 4628-4634. [108] HENDY N, SLOAN C, TIAN F, et al. FishingNet: future inference of semantic heatmaps in grids[J]. arXiv:2006. 09917, 2020. [109] MIN C, XIAO L, ZHAO D W, et al. Multi-camera unified pre-training via 3D scene reconstruction[J]. IEEE Robotics and Automation Letters, 2023, 9: 3243-3250. [110] HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 770-778. [111] LIU Z, LIN Y T, CAO Y, et al. Swin Transformer: hierarchical vision transformer using shifted windows[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021: 9992-10002. [112] HUANG G, LIU Z, VAN DER MAATEN L, et al. Densely connected convolutional networks[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 2261-2269. [113] SUN K, XIAO B, LIU D, et al. Deep high-resolution representation learning for human pose estimation[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 5686-5696. [114] LIN T Y, DOLLáR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 936-944. [115] ROSENFELD A, THURSTON M. Edge and curve detection for visual scene analysis[J]. IEEE Transactions on Computers, 1971, 20(5): 562-569. [116] WU P X, CHEN S H, METAXAS D N. MotionNet: joint perception and motion prediction for autonomous driving based on bird’s eye view maps[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 11382-11392. [117] MENG Z Y, SONG Y H, ZHANG Y L, et al. Traffic object detection for autonomous driving fusing LiDAR and pseudo 4D-radar under bird’s-eye-view[J]. IEEE Transactions on Intelligent Transportation Systems, 2024, 25(11): 18185-18195. [118] SWERDLOW A, XU R S, ZHOU B L. Street-view image generation from a bird’s-eye view layout[J]. IEEE Robotics and Automation Letters, 2024, 9(4): 3578-3585. [119] NEVEN D, DE BRABANDERE B, GEORGOULIS S, et al. Towards end-to-end lane detection: an instance segmentation approach[C]//Proceedings of the 2018 IEEE Intelligent Vehicles Symposium. Piscataway: IEEE, 2018: 286-291. [120] TABELINI L, BERRIEL R, PAIXAO T M, et al. Keep your eyes on the lane: real-time attention-guided lane detection[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 294-302. [121] TABELINI L, BERRIEL R, PAIX?O T M, et al. PolyLaneNet: lane estimation via deep polynomial regression[C]//Proceedings of the 2020 25th International Conference on Pattern Recognition. Piscataway: IEEE, 2021: 6150-6156. [122] ZHAO T, GUO P L, HE J X, et al. A hierarchical scheme of road unevenness perception with LiDAR for autonomous driving comfort[J]. IEEE Transactions on Intelligent Vehicles, 2023, 9(1): 2439-2448. [123] ZHAO T, YANG L, XIE Y C, et al. RoadBEV: road surface reconstruction in bird’s eye view[J]. IEEE Transactions on Intelligent Transportation Systems, 2024, 25(11): 19088-19099. [124] ZHAO T, XU C F, DING M Y, et al. RSRD: a road surface reconstruction dataset and benchmark for safe and comfortable autonomous driving[J]. arXiv:2310.02262, 2023. [125] LUO L, ZHENG S H, LI Y X, et al. BEVPlace: learning LiDAR-based place recognition using bird’s eye view images[C]//Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2023: 8666-8675. [126] 施宇, 王乐, 姚叶鹏, 等. 基于强化特征金字塔和聚焦损失的小目标检测[J]. 计算机科学与探索, 2025, 19(3): 693-702. SHI Y, WANG L, YAO Y P, et al. Small object detection based on enhanced feature pyramid and Focal-AIoU loss [J]. Journal of Frontiers of Computer Science and Technology, 2025, 19(3): 693-702. [127] 李琳, 靳志鑫, 俞晓磊, 等. Haar小波下采样优化YOLOv9的道路车辆和行人检测[J]. 计算机工程与应用, 2024, 60(20): 207-214. LI L, JIN Z X, YU X L, et al. Road vehicle and pedestrian detection based on YOLOv9 for Haar wavelet downsampling[J]. Computer Engineering and Applications, 2024, 60(20): 207-214. [128] 王雪秋, 高焕兵, 郏泽萌. 改进YOLOv8的道路缺陷检测算法[J]. 计算机工程与应用, 2024, 60(17): 179-190. WANG X Q, GAO H B, JIA Z M. Improved road defect detection algorithm based on YOLOv8[J]. Computer Engineering and Applications, 2024, 60(17): 179-190. [129] LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot MultiBox detector[C]//Proceedings of the European Conference on Computer Vision. Cham: Springer International Publishing, 2016: 21-37. [130] FU C Y, LIU W, RANGA A, et al. DSSD: deconvolutional single shot detector[J]. arXiv:1701.06659, 2017. [131] LIU S T, HUANG D, WANG Y H. Receptive field block net for accurate and fast object detection[C]//Proceedings of the European Conference on Computer Vision. Cham: Springer International Publishing, 2018: 404-419. [132] ZHANG S F, WEN L Y, BIAN X, et al. Single-shot refinement neural network for object detection[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 4203-4212. [133] 牛文涛, 王鹏, 陈遵田, 等. 采用动态样本分配的特征融合目标检测算法[J]. 计算机工程与应用, 2024, 60(15): 211-220. NIU W T, WANG P, CHEN Z T, et al. Feature fusion target detection algorithm using dynamic sample assignment[J]. Computer Engineering and Applications, 2024, 60(15): 211-220. [134] 葛海波, 李强, 周婷, 等. 改进SSD特征融合的目标检测算法研究[J]. 计算机工程与应用, 2023, 59(22): 193-201. GE H B, LI Q, ZHOU T, et al. Research on target detection algorithm based on improved SSD feature fusion[J]. Computer Engineering and Applications, 2023, 59(22): 193-201. [135] LIU H I, TSENG Y W, CHANG K C, et al. A DeNoising FPN with transformer R-CNN for tiny object detection[J]. IEEE Transactions on Geoscience and Remote Sensing, 2024, 62: 4704415. [136] XIE X X, CHENG G, WANG J B, et al. Oriented R-CNN and beyond[J]. International Journal of Computer Vision, 2024, 132(7): 2420-2442. [137] IANDOLA, F N, MOSKEWICZ, M W, et al.SqueezeNet: alexnet-level accuracy with 50× fewer parameters and <1 MB model size[J]. arXiv:1602.07360, 2016. [138] HOWARD A G, ZHU M, CHEN B, et al. MobileNets: efficient convolutional neural networks for mobile vision applications[J]. arXiv:1704.04861, 2017. [139] SANDLER M, HOWARD A, ZHU M L, et al. MobileNetV2: inverted residuals and linear bottlenecks[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 4510-4520. [140] HOWARD A, SANDLER M, CHEN B, et al. Searching for MobileNetV3[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 1314-1324. [141] ZHANG X Y, ZHOU X Y, LIN M X, et al. ShuffleNet: an extremely efficient convolutional neural network for mobile devices[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 6848-6856. [142] HAN K, WANG Y H, TIAN Q, et al. GhostNet: more features from cheap operations[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 1577-1586. [143] TANG Y H, HAN K, GUO J Y, et al. GhostNetv2: enhance cheap operation with long-range attention[C]//Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS 2022), 2022. |
| [1] | 聂源, 赖惠成, 高古学. 改进YOLOv7+Bytetrack的小目标检测与追踪[J]. 计算机工程与应用, 2024, 60(12): 189-202. |
| [2] | 程光,罗予频,王宏宝. 联合轮廓法在低分辨率视频下的多目标追踪[J]. 计算机工程与应用, 2007, 43(2): 64-64. |
| 阅读次数 | ||||||
|
全文 |
|
|||||
|
摘要 |
|
|||||