Computer Engineering and Applications ›› 2024, Vol. 60 ›› Issue (11): 182-193.DOI: 10.3778/j.issn.1002-8331.2309-0041

• Graphics and Image Processing • Previous Articles     Next Articles

Research on Target Detection Algorithm for Fusion of Lidar and Monocular Camera Under BEV Features

LI Wenli, YU Fei, SHI Xiaohui, TANG Yuanhang, YANG Guo   

  1. 1.Key Laboratory of Advanced Manufacturing Technology for Automobile Parts, Chongqing University of Technology, Chongqing 400054, China
    2.Chongqing Chang’an Automobile Co., Chongqing 400020, China
  • Online:2024-06-01 Published:2024-05-31

BEV特征下激光雷达和单目相机融合的目标检测算法研究

李文礼,喻飞,石晓辉,唐远航,杨果   

  1. 1.重庆理工大学 汽车零部件先进制造技术教育部重点实验室,重庆 400054
    2.重庆长安汽车股份有限公司,重庆 400020

Abstract: In order to improve the detection accuracy of surrounding targets by self-driving vehicles, a target object detection algorithm (monocular-bird's eye view fusion, Mono-BEVFusion) with fusion of lidar and monocular image data on bird's eye view features is proposed. To construct camera BEV features, the algorithm builds a simple and efficient depth prediction network to predict the depth of camera features, which is supervised with depth truth values based on an explicit supervision method. When constructing the lidar BEV features, the laser point cloud is voxelized into a columnar grid and transformed into BEV features, and the BEV feature fusion network is designed to fuse the laser point cloud BEV features with the camera BEV features, and the fused features are inputted into the target detection framework to obtain the detection results of the target objects (cars, pedestrians, and cyclists). The Mono-BEVFusion algorithm is evaluated using KITTI dataset and real-vehicle road-harvesting data, and the experimental results show that the algorithm improves the mean average precision by 2.90 percentage points  compared with existing fusion algorithms, in which the single detection precision of the car class and the pedestrian class is improved by 3.38 percentage points and 4.13 percentage points, respectively. The Mono-BEVFusion algorithm has more stable detection effect on the occluded targets or the targets with a long distance, and can effectively avoid the leakage detection phenomenon of single sensor, which has better practical application value.

Key words: self-driving vehicles, target object detection algorithm, depth prediction, BEV feature fusion, KITTI dataset

摘要: 为提高自动驾驶汽车对周围目标物的检测精度,提出了一种激光雷达和单目图像数据在鸟瞰图特征上融合的目标物检测算法(monocular-bird’s eye view fusion,Mono-BEVFusion)。为构建相机BEV特征,搭建了简单高效的深度预测网络预测相机特征的深度,基于显式监督的方法用深度真值对其进行监督。构建激光雷达BEV特征时,将激光点云体素化为柱状网格转化到BEV特征下,设计BEV特征融合网络将激光点云BEV特征和相机BEV特征融合,将融合特征输入到目标检测框架得到目标物(汽车、行人和骑行人)检测结果。利用KITTI数据集和实车路采数据对Mono-BEVFusion融合算法进行评估,实验结果表明该算法相较于现有融合算法综合平均精度提升了2.90个百分点,其中汽车类和行人类单项检测精度分别提升3.38个百分点和4.13个百分点。Mono-BEVFusion融合算法对遮挡目标或者距离较远的目标有较稳定的检测效果,能够有效避免单传感器的漏检现象,具有较好的实际应用价值。

关键词: 自动驾驶汽车, 目标物检测算法, 深度预测, BEV特征融合, KITTI数据集