Computer Engineering and Applications ›› 2023, Vol. 59 ›› Issue (7): 1-14.DOI: 10.3778/j.issn.1002-8331.2209-0280

• Research Hotspots and Reviews • Previous Articles     Next Articles

Survey of Camera Pose Estimation Methods Based on Deep Learning

WANG Jing, JIN Yuchu, GUO Ping, HU Shaoyi   

  1. School of Communication and Information Engineering, Xi’an University of Science and Technology, Xi’an 710054, China
  • Online:2023-04-01 Published:2023-04-01



  1. 西安科技大学 通信与信息工程学院,西安 710054

Abstract: Camera pose estimation is a technology to accurately estimate the 6-DOF position and pose of camera in world coordinate system under known environment. It is a key technology in robotics and automatic driving. With the rapid development of deep learning, using deep learning to optimize camera pose estimation algorithm has become one of the current research hotspots. In order to master the current research status and trends of camera pose estimation algorithms, the mainstream algorithms based on deep learning are summarized. Firstly, the traditional camera pose estimation methods based on feature points is briefly introduced. Then, the camera pose estimation method based on deep learning is mainly introduced. According to the different core algorithms, the end-to-end camera pose estimation, scene coordinate regression, camera pose estimation based on retrieval, hierarchical structure, multi-information fusion and cross scenescamera pose estimation are elaborated and analyzed in detail. Finally, this paper summarizes the current research status, points out the challenges in the field of camera pose estimation based on in-depth performance analysis, and prospects the development trend of camera pose estimation.

Key words: deep learning, camera pose estimation, scene coordinate regression, multi-information fusion

摘要: 相机位姿估计是指在已知环境下精确地估计相机在世界坐标系中六自由度位姿的技术,该技术是机器人技术和自动驾驶中的关键技术。随着深度学习的飞速发展,使用深度学习来优化相机位姿估计算法已经成为了当前的研究热点之一。为了掌握目前相机位姿估计算法的研究现状与趋势,对基于深度学习的相机位姿估计的主流算法进行了综述。简单介绍了传统的基于特征点的相机位姿估计方法。重点介绍了基于深度学习的方法:根据核心算法的不同,从端到端的相机位姿估计、场景坐标回归、基于检索的相机位姿估计、层级结构、多信息融合和跨场景的相机位姿估计六个方面进行了详细的阐述和分析。对研究现状进行了总结,并基于深入的性能分析指出了相机位姿估计领域面临的挑战,展望了其发展动向。

关键词: 深度学习, 相机位姿估计, 场景坐标回归, 多信息融合