计算机工程与应用 ›› 2023, Vol. 59 ›› Issue (7): 1-14.DOI: 10.3778/j.issn.1002-8331.2209-0280
王静,金玉楚,郭苹,胡少毅
出版日期:
2023-04-01
发布日期:
2023-04-01
WANG Jing, JIN Yuchu, GUO Ping, HU Shaoyi
Online:
2023-04-01
Published:
2023-04-01
摘要: 相机位姿估计是指在已知环境下精确地估计相机在世界坐标系中六自由度位姿的技术,该技术是机器人技术和自动驾驶中的关键技术。随着深度学习的飞速发展,使用深度学习来优化相机位姿估计算法已经成为了当前的研究热点之一。为了掌握目前相机位姿估计算法的研究现状与趋势,对基于深度学习的相机位姿估计的主流算法进行了综述。简单介绍了传统的基于特征点的相机位姿估计方法。重点介绍了基于深度学习的方法:根据核心算法的不同,从端到端的相机位姿估计、场景坐标回归、基于检索的相机位姿估计、层级结构、多信息融合和跨场景的相机位姿估计六个方面进行了详细的阐述和分析。对研究现状进行了总结,并基于深入的性能分析指出了相机位姿估计领域面临的挑战,展望了其发展动向。
王静, 金玉楚, 郭苹, 胡少毅. 基于深度学习的相机位姿估计方法综述[J]. 计算机工程与应用, 2023, 59(7): 1-14.
WANG Jing, JIN Yuchu, GUO Ping, HU Shaoyi. Survey of Camera Pose Estimation Methods Based on Deep Learning[J]. Computer Engineering and Applications, 2023, 59(7): 1-14.
[1] DURRANT-WHYTE H,BAILEY T.Simultaneous localization and mapping:part I[J].IEEE Robotics & Automation Magazine,2006,13(2):99-110. [2] NISTER D,NARODITSKY O,BERGEN J R.Visual odometry[C]//Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.Washington DC:IEEE Computer Society,2004:652-659. [3] 陈宗海,裴浩渊,王纪凯,等.基于单目相机的视觉重定位方法综述[J].机器人,2021,43(3):373-384. CHEN Z H,PEI H Y,WANG J K,et al.Survey of monocular camera-based visual relocalization[J].Robot,2021,43(3):373-384. [4] SHAVIT Y,FERENS R.Introduction to camera pose estimation with deep learning[J].arXiv:1907.05272,2019. [5] 刘艺,李蒙蒙,郑奇斌,等.视频目标跟踪算法综述[J].计算机科学与探索,2022,16(7):1504-1515. LIU Y,LI M M,ZHENG Q B,et al.Survey on video object tracking algorithms[J].Journal of Frontiers of Computer Science and Technology,2022,16(7):1504-1515. [6] TORII A,ARANDJELOVIC R,SIVIC J,et al.24/7 place recognition by view synthesis[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition,2015:1808-1817. [7] BRACHMANN E,ROTHER C.Neural-guided RANSAC:Learning where to sample model hypotheses[C]//Proceedings of IEEE/CVF International Conference on Computer Vision,2019:4321-4330. [8] LINDEBERG T.Scale invariant feature transform[J].Scholarpedia,2012,7(5). [9] RUBLEE E,RABAUD V,KONOLIGE K,et al.ORB:an efficient alternative to SIFT or SURF[C]//Proceedings of International Conference on Computer Vision,2011:2564-2571. [10] JEGOU H,DOUZE M,SCHMID C,et al.Aggregating local descriptors into a compact image representation[C]//Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition,2010:3304-3311. [11] DETONE D,MALISIEWICZ T,RABINOVICH A.SuperPoint:self-supervised interest point detection and description[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops,2018:337-349. [12] LI K,WANG L,LIU L,et al.Decoupling makes weakly supervised local feature better[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2022:15838-15848. [13] ARANDJELOVIC R,GRONAT P,TORII A,et al.NetVLAD:CNN architecture for weakly supervised place recognition[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition,2016:5297-5307. [14] SILPA-ANAN C,HARTLEY R.Optimised KD-trees for fast image descriptor matching[C]//Proceedings of 2008 IEEE Conference on Computer Vision and Pattern Recognition,Anchorage,2008:1-8. [15] KRISHNA K,NARASIMHA MURTY M.Genetic K-means algorithm[J].IEEE Transactions on Systems,Man,and Cybernetics(Part B Cybernetics),1999,29(3):433-439. [16] SATTLER T,LEIBE B,KOBBELT L.Fast image-based localization using direct 2D-to-3D matching[C]//Proceedings of International Conference on Computer Vision,2011:667-674. [17] SATTLER T,LEIBE B,KOBBELT L.Efficient & effective prioritized matching for large-scale image-based localization[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,39(9):1744-1756. [18] MUJA M,LOWE D G.Scalable nearest neighbor algorithms for high dimensional data[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2014,36(11):2227-2240. [19] FISCHLER M A,BOLLES R C.Random sample consensus:A paradigm for model fitting with applications to image analysis and automated cartography[J].Communications of the ACM,1981,24(6):381-395. [20] LEPETIT V,MORENO-NOGUER F,FUA P.EPnP:an accurate [O(n)] solution to the PnP problem[J].International Journal of Computer Vision,2009,81(2):155-166. [21] KNEIP L,LI H,SEO Y.UPnP:an optimal [O(n)] solution to the absolute pose problem with universal applicability[C]//Proceedings of European Conference on Computer Vision,2014:127-142. [22] KENDALL A,GRIMES M,CIPOLLA R.Posenet:a convolutional network for real-time 6-DoF camera relocalization[C]//Proceedings of IEEE International Conference on Computer Vision,2015:2938-2946. [23] WALCH F,HAZIRBAS C,LEAL-TAIXE L,et al.Image-based localization using LSTMs for structured feature correlation[C]//Proceedings of IEEE International Conference on Computer Vision,2017:627-637. [24] WANG B,CHEN C,LU C X,et al.AtLoc:attention guided camera localization[C]//Proceedings of the AAAI Conference on Artificial Intelligence,2020:10393-10401. [25] XUE F,WU X,CAI S J,et al.Learning multi-view camera relocalization with graph neural networks[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition,2020:11372-11381. [26] BRACHMANN E,KRULL A,NOWOZIN S,et al.DSAC—differentiable RANSAC for camera localization[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition,2017:2492-2500. [27] BRACHMANN E,ROTHER C.Learning less is more-6D camera localization via 3D surface regression[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition,2018:4654-4662. [28] DUONG N D,SOLADIE C,KACETE A,et al.Efficient multi-output scene coordinate prediction for fast and accurate camera relocalization from a single RGB image[J].Computer Vision and Image Understanding,2020,190:102850. [29] HUANG Z,ZHOU H,LI Y,et al.VS-Net:voting with Segmentation for Visual Localization[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2021:6101-6111. [30] LASKAR Z,MELEKHOV I,KALIA S,et al.Camera relocalization by computing pairwise relative poses using convolutional neural network[C]//Proceedings of IEEE International Conference on Computer Vision Workshops,2017:920-929. [31] ZHOU Q,SATTLER T,POLLEFEYS M,et al.To learn or not to learn:visual localization from essential matrices[C]//Proceedings of IEEE International Conference on Robotics and Automation,2020:3319-3326. [32] SARLIN P E,CADENA C,SIEGWART R,et al.From coarse to fine:robust hierarchical localization at large scale[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition,2019:12708-12717. [33] DING M Y,WANG Z,SUN J K,et al.CamNet:coarse-to-fine retrieval for camera re-localization[C]//Proceedings of IEEE International Conference on Computer Vision,2019:2871-2880. [34] LI X T,WANG S Z,ZHAO Y,et al.Hierarchical scene coordinate classification and regression for visual localization[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition,2020:11980-11989. [35] VALADA A,RADWAN N,BURGARD W.Deep auxiliary learning for visual localization and odometry[C]//Proceedings of IEEE International Conference on Robotics and Automation,2018:6939-6946. [36] ZHOU L,LUO Z,SHEN T,et al.KfNet:learning temporal camera relocalization using Kalman filtering[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2020:4919-4928. [37] ZHOU K,CHEN C,WANG B,et al.VMLoc:variational fusion for learning-based multimodal camera localization[C]//Proceedings of the AAAI Conference on Artificial Intelligence,2021,35(7):6165-6173. [38] LI T,ZHAN Z,TAN G.Accurate visual localization with semantic masking and attention[J].EURASIP Journal on Advances in Signal Processing,2022,42(1):1-17. [39] YANG L,BAI Z,TANG C,et al.SANet:scene agnostic network for camera localization[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision,2019:42-51. [40] SARLIN P E,UNAGAR A,LARSSON M,et al.Back to the feature:Learning robust camera localization from pixels to pose[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2021:3247-3257. [41] SZEGEDY C,LIU W,JIA Y,et al.Going deeper with convolutions[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition,2015:1-9. [42] KENDALL A,CIPOLLA R.Modelling uncertainty in deep learning for camera relocalization[C]//Proceedings of 2016 IEEE International Conference on Robotics and Automation,2016:4762-4769. [43] KENDALL A,CIPOLLA R.Geometric loss functions for camera pose regression with deep learning[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition,2017:6555-6564. [44] MELEKHOV I,YLIOINAS J,KANNALA J,et al.Image-based localization using hourglass networks[C]//Proceedings of IEEE International Conference on Computer Vision Workshops,2017:870-877. [45] WU J,MA L,HU X.Delving deeper into convolutional neural networks for camera relocalization[C]//Proceedings of IEEE International Conference on Robotics and Automation,2017:5644-5651. [46] SHAVIT Y,FERENS R,KELLER Y.Paying attention to activation maps in camera pose regression[J].arXiv:2103. 11477,2021. [47] GHOFRANI A,TOROGHI R M,TABATABAIE S M.Catiloc:camera image transformer for indoor localization[C]//Proceedings of 2021 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP),2021:1450-1454. [48] CLARK R,WANG S,MARKHAM A,et al.VidLoc:a deep spatio-temporal model for 6-DOF video-clip relocalization[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2017:6856-6864. [49] LI M,QIN J,LI D,et al.VNLSTM-PoseNet:a novel deep ConvNet for real-time 6-DOF camera relocalization in urban streets[J].Geo-Spatial Information Science,2021,24(3):422-437. [50] 阮晓钢,李昂,黄静.基于自监督循环卷积神经网络的位姿估计方法[J].北京工业大学学报,2021,47(12):1311-1320. RUAN X G,LI A,HUANG J.Pose estimation method based on self-supervised recurrent convolutional neural networks[J].Journal of Beijing University of Technology,2021,47(12):1311-1320. [51] TURKOGLU M O,BRACHMANN E,SCHINDLER K,et al.Visual camera re-localization using graph neural networks and relative pose supervision[C]//Proceedings of 2021 International Conference on 3D Vision,2021:145-155. [52] ELMOOGY A,DONG X,LU T,et al.Pose-GNN:camera pose estimation system using graph neural networks[J].arXiv:2103.09435,2021. [53] BLANTON H,GREENWELL C,WORKMAN S,et al.Extending absolute pose regression to multiple scenes[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops,2020:38-39. [54] SHAVIT Y,FERENS R,KELLER Y.Learning multi-scene absolute pose regression with transformers[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision,2021:2733-2742. [55] SHOTTON J,GLOCKER B,ZACH C,et al.Scene coordinate regression forests for camera relocalization in RGB-D images[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2013:2930-2937. [56] LI X T,YLIOINAS J,AND KANNALA J.Full frame scene coordinate regression for image based localization[J].arXiv:1802.03237,2018. [57] BRACHMANN E,ROTHER C.Visual camera re-localization from RGB and RGB-D images using DSAC[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2021,44(9):5847-5865. [58] 王静,胡少毅,郭苹,等.改进场景坐标回归网络的室内相机重定位方法[J/OL].计算机工程与应用:1-12(2022-06-27)[2022-10-04].http://kns.cnki.net/kcms/detail/11.2127.TP.20220627.1352.014.html. WANG J,HU S Y,GUO P,et al.Indoor camera relocation method base on improved scene coordinate regression network[J/OL].Computer Engineering and Applications:1-12(2022-06-27)[2022-10-04].http://kns.cnki.net/kcms/detail/11.2127.TP.20220627.1352.014.html. [59] GUAN P,CAO Z,YU J,et al.Scene coordinate regression network with global context-guided spatial feature transformation for visual relocalization[J].IEEE Robotics and Automation Letters,2021,6(3):5737-5744. [60] XIE T,DAI K,WANG K,et al.A Deep feature aggregation network for accurate indoor camera localization[J].IEEE Robotics and Automation Letters,2022,7(2):3687-3694. [61] CAI M,ZHAN H,SAROJ WEERASEKERA C,et al.Camera relocalization by exploiting multi-view constraints for scene coordinates regression[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops,2019:3769-3777. [62] DO T,MIKSIK O,DEGOL J,et al.Learning to detect scene landmarks for camera localization[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2022:11132-11142. [63] BALNTAS V,LI S,PRISACARIU V.ReLocNet:continuous metric learning relocalisation using neural nets[C]//Proceedings of the European Conference on Computer Vision,2018:751-767. [64] LI Q,ZHU J,CAO R,et al.Relative geometry-aware siamese neural network for 6DOF camera relocalization[J].Neurocomputing,2021,426:134-146. [65] ABOUELNAGA Y,BUI M,ILIC S.DistillPose:Lightweight camera localization using auxiliary learning[C]//Proceedings of 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems(IROS),2021:7919-7924. [66] YANG S,SHI D.RnR:retrieval and reprojection learning model for camera localization[J].IEEE Access,2021,9:34626-34634. [67] SON M,KO K.Learning-based essential matrix estimation for visual localization[J].Journal of Computational Design and Engineering,2022,9(3):1097-1106. [68] BRAHMBHATT S,GU J,KIM K,et al.Geometry-aware learning of maps for camera localization[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition,2018:2616-2625. [69] RADWAN N,VALADA A,BURGARD W.VlocNet++:deep multitask learning for semantic visual localization and odometry[J].IEEE Robotics and Automation Letters,2018,3(4):4407-4414. [70] SHI T X,SHEN S H,GAO X,et al.Visual localization using sparse semantic 3D map[C]//Proceedings of IEEE International Conference on Image Processing,2019:315-319. [71] CHEN L C,ZHU Y,PAPANDREOU G,et al.Encoder-decoder with atrous separable convolution for semantic image segmentation[C]//Proceedings of European Conference on Computer Vision,2018:833-851. [72] CHEN H,XIONG Y,WANG J,et al.Long-term visual localization with semantic enhanced global retrieval[C]//Proceedings of 2021 17th International Conference on Mobility,Sensing and Networking(MSN),2021:319-326. [73] TIAN M,NIE Q,SHEN H.3D scene geometry-aware constraint for camera localization with deep learning[C]//Proceedings of 2020 IEEE International Conference on Robotics and Automation,2020:4211-4217. [74] YAN Q,ZHENG J,REDING S,et al.CrossLoc:scalable aerial localization assisted by multimodal synthetic data[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2022:17358-17368. [75] WU J,SHI Q,LU Q,et al.Learning invariant semantic representation for long-term robust visual localization[J].Engineering Applications of Artificial Intelligence,2022,111:104793. [76] OTT F,FEIGL T,LOFFLER C,et al.ViPR:visual-odometry-aided pose regression for 6DoF camera localization[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops,2020:42-43. [77] PARAMESHWARA C M,HARI G,FERMüLLER C,et al.DiffPoseNet:direct differentiable camera pose estimation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2022:6845-6854. [78] TANG S,TANG C,HUANG R,et al.Learning camera localization via dense scene matching[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2021:1831-1841. [79] HARTLEY R,ZISSERMAN A.Multiple view geometry in computer vision[M].Cambridge:Cambridge University Press,2003. [80] GLOCKER B,IZADI S,SHOTTON J,et al.Real-time RGB-D camera relocalization[C]//Proceedings of IEEE International Symposium on Mixed and Augmented Reality,2013:173-179. [81] VALENTIN J,DAI A,NIE?NER M,et al.Learning to navigate the energy landscape[C]//Proceedings of 2016 Fourth International Conference on 3D Vision,2016:323-332. [82] SATTLER T,MADDERN W,TOFT C,et al.Benchmarking 6DOF outdoor visual localization in changing conditions[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2018:8601-8610. [83] MADDERN W,PASCOE G,LINEGAR C,et al.1 year,1?000 km:the Oxford RobotCar dataset[J].International Journal of Robotics Research,2017,36(1):3-15. [84] BRACHMANN E,HUMENBERGER M,ROTHER C,et al.On the limits of pseudo ground truth in visual camera re-localisation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision,2021:6218-6228. |
[1] | 刘华玲, 皮常鹏, 赵晨宇, 乔梁. 基于深度域适应的跨域目标检测算法综述[J]. 计算机工程与应用, 2023, 59(8): 1-12. |
[2] | 何家峰, 陈宏伟, 骆德汉. 深度学习实时语义分割算法研究综述[J]. 计算机工程与应用, 2023, 59(8): 13-27. |
[3] | 张艳青, 马建红, 韩颖, 曹仰杰, 李颉, 杨聪. 真实场景下图像超分辨率重建研究综述[J]. 计算机工程与应用, 2023, 59(8): 28-40. |
[4] | 岱超, 刘萍, 史俊才, 任鸿杰. 利用U型网络的遥感影像建筑物规则化提取[J]. 计算机工程与应用, 2023, 59(8): 105-116. |
[5] | 蒋玉英, 陈心雨, 李广明, 王飞, 葛宏义. 图神经网络及其在图像处理领域的研究进展[J]. 计算机工程与应用, 2023, 59(7): 15-30. |
[6] | 周玉蓉, 张巧灵, 于广增, 徐伟强. 基于声信号的工业设备故障诊断研究综述[J]. 计算机工程与应用, 2023, 59(7): 51-63. |
[7] | 韦健, 赵旭, 李连鹏. 融合位置信息注意力的孪生弱目标跟踪算法[J]. 计算机工程与应用, 2023, 59(7): 198-206. |
[8] | 赵宏伟, 郑嘉俊, 赵鑫欣, 王胜春, 李浥东. 基于双模态深度学习的钢轨表面缺陷检测方法[J]. 计算机工程与应用, 2023, 59(7): 285-293. |
[9] | 高腾, 张先武, 李柏. 深度学习在安全帽佩戴检测中的应用研究综述[J]. 计算机工程与应用, 2023, 59(6): 13-29. |
[10] | 蒋心璐, 陈天恩, 王聪, 李书琴, 张宏鸣, 赵春江. 农业害虫检测的深度学习算法综述[J]. 计算机工程与应用, 2023, 59(6): 30-44. |
[11] | 江倩殷, 余志, 李熙莹. 标签差网络在噪声标签数据集中的应用[J]. 计算机工程与应用, 2023, 59(6): 92-100. |
[12] | 李宇, 韩晓红, 张玲, 张海轩, 李钢. 融合时空注意力机制的P波到时拾取网络[J]. 计算机工程与应用, 2023, 59(6): 113-124. |
[13] | 吕晓玲, 杨胜月, 张明路, 梁明, 王俊超. 改进YOLOv5网络的鱼眼图像目标检测算法[J]. 计算机工程与应用, 2023, 59(6): 241-250. |
[14] | 彭佩, 张美玲, 郑东. 融合CNN_LSTM的侧信道攻击[J]. 计算机工程与应用, 2023, 59(6): 268-276. |
[15] | 孙书魁, 范菁, 李占稳, 曲金帅, 路佩东. 人工智能在新型冠状病毒肺炎中的研究综述[J]. 计算机工程与应用, 2023, 59(5): 28-39. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||