计算机工程与应用 ›› 2023, Vol. 59 ›› Issue (15): 160-168.DOI: 10.3778/j.issn.1002-8331.2204-0386

• 模式识别与人工智能 • 上一篇    下一篇

改进场景坐标回归网络的室内相机重定位方法

王静,胡少毅,郭苹,金玉楚   

  1. 西安科技大学 通信与信息工程学院,西安 710054
  • 出版日期:2023-08-01 发布日期:2023-08-01

Indoor Camera Relocation Method Based on Improved Scene Coordinate Regression Network

WANG Jing, HU Shaoyi, GUO Ping, JIN Yuchu   

  1. College of Communication and Information Technology, Xi’an University of Science and Technology, Xi’an 710054, China
  • Online:2023-08-01 Published:2023-08-01

摘要: 传统相机重定位依赖手工特征,场景的变化会影响其后续特征匹配,导致算法总体性能下降。然而,基于深度学习场景坐标回归的相机重定位方法在室内场景下有着较好的表现。针对复杂场景下定位精度低以及在特征提取过程中空间信息丢失的问题,在场景坐标回归方法的基础上,提出一种基于深度过参化卷积与细粒度信息的相机定位方法。该方法在特征提取网络中,引入深度过参化卷积层取代传统的卷积层,使提取的特征更具有鲁棒性;在特征提取网络之后,增加细粒度信息,加强特征提取,解决特征提取带来的空间信息丢失问题;通过全连接层输出场景坐标,建立二维图像像素和三维场景坐标之间的关系,然后使用多点透视随机抽样一致性算法得到相机位姿。实验结果表明,改进后的方法与同类型算法相比有明显的提升,该方法能够将平均角度精度提高20.00%,对相机重定位有显著效果,验证了该方法在一定程度上能够克服视觉特征对相机重定位的影响。

关键词: 相机重定位, 相机位姿, 场景坐标回归, 细粒度信息, 特征提取

Abstract: Traditional camera relocation relies on manual features, and changes in the scene will affect its subsequent feature matching, resulting in a degradation of the overall performance of the algorithm. However, the camera relocation method based on deep learning scene coordinate regression has better performance in indoor scenes. To address the problems of low localization accuracy in complex scenes and loss of spatial information during feature extraction, a camera localization method based on depth wise over-parameterized convolution with fine-grained information is proposed on the basis of the scene coordinate regression method. Firstly, the method introduces a depth wise over-parameterized convolutional layer instead of the traditional convolutional layer in the feature extraction network to make the extracted features more robust. Secondly, after the feature extraction network, fine-grained information is added to enhance feature extraction and solve the problem of spatial information loss caused by feature extraction. Finally, the relationship between 2D image pixels and 3D scene coordinates is established by outputting scene coordinates through a fully connected layer. Then the camera pose is obtained using the perspective-n-point random sample consensus algorithm. The experimental results show that the improved method has obvious improvement compared with the same type of algorithm, and the method is able to improve the average angular accuracy by 20.00%, which has a significant effect on camera repositioning, verifying that the method can overcome the influence of visual features on camera repositioning to a certain extent.

Key words: camera relocation, camera pose, scene coordinate regression, fine-grained information, feature extraction