Indoor Camera Relocation Method Based on Improved Scene Coordinate Regression Network

doi:10.3778/j.issn.1002-8331.2204-0386

Abstract

Abstract: Traditional camera relocation relies on manual features, and changes in the scene will affect its subsequent feature matching, resulting in a degradation of the overall performance of the algorithm. However, the camera relocation method based on deep learning scene coordinate regression has better performance in indoor scenes. To address the problems of low localization accuracy in complex scenes and loss of spatial information during feature extraction, a camera localization method based on depth wise over-parameterized convolution with fine-grained information is proposed on the basis of the scene coordinate regression method. Firstly, the method introduces a depth wise over-parameterized convolutional layer instead of the traditional convolutional layer in the feature extraction network to make the extracted features more robust. Secondly, after the feature extraction network, fine-grained information is added to enhance feature extraction and solve the problem of spatial information loss caused by feature extraction. Finally, the relationship between 2D image pixels and 3D scene coordinates is established by outputting scene coordinates through a fully connected layer. Then the camera pose is obtained using the perspective-n-point random sample consensus algorithm. The experimental results show that the improved method has obvious improvement compared with the same type of algorithm, and the method is able to improve the average angular accuracy by 20.00%, which has a significant effect on camera repositioning, verifying that the method can overcome the influence of visual features on camera repositioning to a certain extent.

Key words: camera relocation, camera pose, scene coordinate regression, fine-grained information, feature extraction

摘要： 传统相机重定位依赖手工特征，场景的变化会影响其后续特征匹配，导致算法总体性能下降。然而，基于深度学习场景坐标回归的相机重定位方法在室内场景下有着较好的表现。针对复杂场景下定位精度低以及在特征提取过程中空间信息丢失的问题，在场景坐标回归方法的基础上，提出一种基于深度过参化卷积与细粒度信息的相机定位方法。该方法在特征提取网络中，引入深度过参化卷积层取代传统的卷积层，使提取的特征更具有鲁棒性；在特征提取网络之后，增加细粒度信息，加强特征提取，解决特征提取带来的空间信息丢失问题；通过全连接层输出场景坐标，建立二维图像像素和三维场景坐标之间的关系，然后使用多点透视随机抽样一致性算法得到相机位姿。实验结果表明，改进后的方法与同类型算法相比有明显的提升，该方法能够将平均角度精度提高20.00%，对相机重定位有显著效果，验证了该方法在一定程度上能够克服视觉特征对相机重定位的影响。

关键词: 相机重定位, 相机位姿, 场景坐标回归, 细粒度信息, 特征提取

WANG Jing, HU Shaoyi, GUO Ping, JIN Yuchu. Indoor Camera Relocation Method Based on Improved Scene Coordinate Regression Network[J]. Computer Engineering and Applications, 2023, 59(15): 160-168.

王静, 胡少毅, 郭苹, 金玉楚. 改进场景坐标回归网络的室内相机重定位方法[J]. 计算机工程与应用, 2023, 59(15): 160-168.

References

[1] 陈宗海，裴浩渊，王纪凯，等.基于单目相机的视觉重定位方法综述[J].机器人，2021，43（3）：373-384.
CHEN Z H，PEI H Y，WANG J K，et al.Survey of monocular cameras-based visual relocalization[J].Robot，2021，43（3）：373-384.
[2] KENDALL A，GRIMES M，CIPOLLA R.Posenet：a convolutional network for real-time 6-dof camera relocalization[C]//Proceedings of the IEEE International Conference on Computer Vision.Santiago：IEEE，2015：2938-2946.
[3] KENDALL A，CIPOLLA R.Geometric loss functions for camera pose regression with deep learning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Hawaii：IEEE，2017：5974-5983.
[4] WALCH F，HAZIRBAS C，TORSTEN S，et al.Image-based localization using LSTMs for structured feature correlation[C]//Proceedings of the IEEE International Conference on Computer Vision.Venice：IEEE，2017：627-637.
[5] BALNTAS V，LI S，PRISACARIU V.Relocnet：continuous metric learning relocalisation using neural nets[C]//Proceedings of the European Conference on Computer Vision（ECCV）.Munich：Springer，2018：751-767.
[6] ARANDJELOVIC R，GRONAT P，TORII A，et al.Netvlad：CNN architecture for weakly supervised place recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Las Vegas：IEEE，2016：5297-5307.
[7] LASKAR Z，MELEKHOV I，KALIA S，et al.Camera relocalization by computing pairwise relative poses using convolutional neural network[C]//Proceedings of the IEEE International Conference on Computer Vision Workshops.Venice：IEEE，2017：929-938.
[8] SHAVIT Y，FERENS R.Introduction to camera pose estimation with deep learning[J].arXiv：1907.05272，2019.
[9] SATTLER T，LEIBE B，KOBBELT L.Efficient & effective prioritized matching for large-scale image-based localization[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，2016，39（9）：1744-1756.
[10] SHOTTON J，GLOCKER B，ZACH C，et al.Scene coordinate regression forests for camera relocalization in RGB-D images[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Portland：IEEE，2013：2930-2937.
[11] SATTLER T，ZHOU Q，POLLEFEYS M，et al.Understanding the limitations of CNN-based absolute camera pose regression[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Long Beach：IEEE，2019：3302-3312.
[12] BRACHMANN E，KRULL A，NOWOZIN S，et al.Dsac-differentiable ransac for camera localization[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Hawaii：IEEE 2017：6684-6692.
[13] BRACHMANN E，ROTHER C.Learning less is more-6D camera localization via 3D surface regression[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Salt Lake City：IEEE，2018：4654-4662.
[14] BRACHMANN E，ROTHER C.Visual camera re-localization from RGB and RGB-D images using DSAC[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，2022，44（9）：5847-5865.
[15] LI X，WANG S，ZHAO Y，et al.Hierarchical scene coordinate classification and regression for visual localization[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Seattle：IEEE，2020：11983-11992.
[16] DUONG N D，SOLADIE C，KACETE A，et al.Efficient multi-output scene coordinate prediction for fast and accurate camera relocalization from a single RGB image[J].Computer Vision and Image Understanding，2020，190：102850.
[17] CAI M，ZHAN H Y，LI K J，et al.Camera relocalization by exploiting multi-view constraints for scene coordinates regression[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops.Seoul：IEEE，2019.
[18] YANG L，BAI Z，TANG C，et al.Sanet：scene agnostic network for camera localization[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.Seoul：IEEE，2019：42-51.
[19] HUANG Z，ZHOU H，LI Y，et al.Vs-net：voting with segmentation for visual localization[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition，2021：6101-6111.
[20] CAO J，LI Y，SUN M，et al.Do-conv：depthwise over-parameterized convolutional layer[J].arXiv：2006.12030，2020.