Review of Visual Odometry Methods Based on Deep Learning

doi:10.3778/j.issn.1002-8331.2203-0480

Abstract

Abstract: Visual odometry（VO） is a common method to deal with the positioning of mobile devices equipped with vision sensors, and has been widely used in autonomous driving, mobile robots, AR/VR and other fields. Compared with traditional model-based methods, deep learning-based methods can learn efficient and robust feature representations from data without explicit computation, thereby improving their ability to handle challenging scenes such as illumination changes and less textures. In this paper, it first briefly reviews the model-based visual odometry methods, and then focuses on six aspects of deep learning-based visual odometry methods, including supervised learning methods, unsupervised learning methods, model-learning fusion methods, common datasets, evaluation metrics, and comparison of models and deep learning methods. Finally, existing problems and future development trends of deep learning-based visual odometry are discussed.

Key words: visual odometry, deep learning, pose estimation, V-SLAM

摘要： 视觉里程计（visual odometry，VO）是处理搭载视觉传感器的移动设备定位问题的一种常用方法，在自动驾驶、移动机器人、AR/VR等领域得到了广泛应用。与传统基于模型的方法相比，基于深度学习的方法可在不需显式计算的情况下从数据中学习高效且鲁棒的特征表达，从而提升其对于光照变化、少纹理等挑战性场景的鲁棒性。简略回顾了基于模型的视觉里程计方法，从监督学习方法、无监督学习方法、模型与学习融合方法、常用数据集、评价指标、模型法与深度学习方法对比分析六个方面全面介绍了基于深度学习的视觉里程计方法。指出了基于深度学习视觉里程计仍存在的问题和未来的发展趋势。

关键词: 视觉里程计, 深度学习, 位姿估计, V-SLAM

ZHI Henghui, YIN Chenyang, LI Huibin. Review of Visual Odometry Methods Based on Deep Learning[J]. Computer Engineering and Applications, 2022, 58(20): 1-15.

职恒辉, 尹晨阳, 李慧斌. 基于深度学习的视觉里程计方法综述[J]. 计算机工程与应用, 2022, 58(20): 1-15.

References

[1] DURRANTWHYTE H，BAILEY T.Simultaneous localization and mapping：part I[J].IEEE Robotics & Automation Magazine，2006，13（3）：108-117.
[2] MORAVEC H P.Obstacle avoidance and navigation in the real world by a seeing robot rover[D].Palo Alto：Stanford University，1980.
[3] MATTHIES L，SHAFER S.Error modeling in stereo navigation[J].IEEE Journal on Robotics and Automation，1987，3（3）：239-248.
[4] NISTER D，NARODITSKY O，BERGEN J，et al.Visual odometry[C]//Proc of IEEE Conference on Computer Vision and Pattern Recognition，2004：652-659.
[5] MATTHIES L，MAIMONE M，JOHNSON A，et al.Computer visionon mars[J].International Journal of Computer Vision，2007，75（1）：67-92.
[6] MALLESON C，GILBERT A，TRUMBLE M，et al.Real-time full body motion capture from video and IMUs[C]//Proceedings of International Conference on 3D Vision，2017.
[7] WANG T，LING H.Gracker：a graph-based planar object tracker[J].IEEE Transactions on Pattern Analysis & Machine Intelligence，2018，40（6）：1494-1501.
[8] CRAIGHEAD J，MURPHY R，BURKE J，et al.A survey of commercial and open source unmanned vehicle simulators[C]//Proceedings 2007 IEEE International Conference on Robotics and Automation，2007：852-857.
[9] WANG S，CLARK R，WEN H，et al.End-to-end，sequence-to-sequence probabilistic visual odometry through deep neural networks[J].The International Journal of Robotics Research，2018，37（4/5）：513-542.
[10] 李宇波，朱效洲，卢惠民，等.视觉里程计技术综述[J].计算机应用研究，2012，29（8）：2801-2805.
LI Y B，ZHU X Z，LU H M，et al.Review on visual odometry technology[J].Application Research of Computers，2012，29（8）：2801-2805.
[11] FUENTES-PACHECO J，RUIZ-ASCENCIO J，RENDóN-MANCHA J M.Visual simultaneous localization and mapping：a survey[J].Artificial Intelligence Review，2015，43（1）：55-81.
[12] HE M，ZHU C，HUANG Q，et al.A review of monocular visual odometry[J].Visual Computer，2019，36（5）：1053-1065.
[13] LI C L，SHANG J N，LI F.Summary of the development of odometer technology[J].Software Guide，2019，18（12）：6-10.
[14] 祝朝政，何明，杨晟，等.单目视觉里程计研究综述[J].计算机工程与应用，2018，54（7）：20-28.
ZHU C Z，HE M，YANG S，et al.Survey of monocular visual odometry[J].Computer Engineering and Applications，2018，54（7）：20-28.
[15] 赵洋，刘国良，田国会，等.基于深度学习的视觉SLAM综述[J].机器人，2017，39（6）：889-896.
ZHAO Y，LIU G L，TIAN G H，et al.A survey of visual SLAM based on deep learning[J].Robot，2017，39（6）：889-896.
[16] 刘瑞军，王向上，张晨，等.基于深度学习的视觉SLAM综述[J].系统仿真学报，2020，32（7）：1244-1256.
LIU R J，WANG X S，ZHANG C，et al.A survey on visual SLAM based on deep learning[J].Journal of System Simulation，2020，32（7）：1244-1256.
[17] 李少朋，张涛.深度学习在视觉SLAM中应用综述[J].空间控制技术与应用，2019，45（2）：1-10.
LI S P，ZHANG T.A survey of deep learning application in visual SLAM[J].Aerospace Control and Application，2019，45（2）：1-10.
[18] 刘旺.基于深度学习的视觉里程计技术研究[D].成都：电子科技大学，2020.
LIU W.Research on visual odometry technology based on deep learning[D].Chengdu：University of Electronic Science and Technology of China，2020.
[19] 陈涛，范林坤，李旭川，等.基于深度学习的智能车辆视觉里程计技术发展综述[J].汽车技术，2021（1）：10.
CHEN T，FAN L K，LI X C，et al.Review on the development of deep learning-based vision odometer technologies for intelligent vehicles[J].Automobile Technology，2021（1）：10.
[20] SHI J.Good features to track[C]//1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition，1994：593-600.
[21] LOWE D G.Distinctive image features from scale-invariant key points[J].International Journal of Computer Vision，2004，60（2）：91-110.
[22] BAY H，TUYTELAARS T，GOOL L V.SURF：speeded up robust features[C]//Proceedings of European Conference on Computer Vision，2006：404-417.
[23] RUBLEE E，RABAUD V，KONOLIGE K，et al.ORB：an efficient alternative to SIFT or SURF[C]//Proceedings of IEEE International Conference on Computer Vision，2012：2564-2571.
[24] LEUTENEGGER S，CHLI M，SIEGWART R Y.BRISK：binary robust invariant scalable keypoints[C]//Proceedings of International Conference on Computer Vision，2011：2548-2555.
[25] DAVISON A J，REID L D，MOLTON N D.MonoSLAM：real-time single camera SLAM[J].IEEE Transactions on Pattern Analysis & Machine Intelligence，2007，29（6）：1052-1067.
[26] KLEIN G，MURRAY D.Parallel tracking and mapping for small AR workspaces[C]//IEEE & ACM International Symposium on Mixed & Augmented Reality，2008.
[27] TRIGGS B，MCLAUCHLAN P F，HARTLEY R I，et al.Bundle adjust ment a modern synthesis[C]//Proceedings of International Workshop on Vision Algorithms：Theory and Practice，1999：298-372.
[28] MUR-ARTAL R，MONTIEL J M，TARDOS J D.ORB-SLAM：a versatile and accurate monocular SLAM system[J].IEEE Transactions on Robotics，2015，31（5）：1147-1163.
[29] MUR-ARTAL R，TARDOS J D.ORB-SLAM2：an open- source SLAM system for monocular，stereo and RGB-D cameras[J].IEEE Transactions on Robotics，2017，33（5）：1255-1262.
[30] CAMPOS C，ELVIRA R，RODRíGUEZ J J G，et al.ORB-SLAM3：an accurate open-source library for visual，visual-inertial，and multimap SLAM[J].IEEE Transactions on Robotics，2021，37（6）：1874-1890.
[31] NEWCOMBE R A，LOVEGROVE S J，DAVISON A J.DTAM：dense tracking and mapping in real-time[C]//2011 International Conference on Computer Vision，2011：2320-2327.
[32] ENGEL J，SCH?PS T，CREMERS D.LSD-SLAM：large-scale direct monocular SLAM[C]//European Conference on Computer Vision.Cham：Springer，2014：834-849.
[33] ENGEL J，KOLTUN V，CREMERS D.Direct sparse odometry[J].IEEE Transactions on Pattern Analysis & Machine Intelligence，2017，40（3）：611-625.
[34] KENDALL A，GRIMES M，CIPOLLA R.Posenet：a convolutional network for real-time 6-dof camera relocali-
zation[C]//Proceedings of the IEEE International Conference on Computer Vision，2015：2938-2946.
[35] SZEGEDY C，LIU W，JIA Y，et al.Going deeper with con-volutions[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.New York：IEEE，2015：1-9.
[36] KENDALL A，CIPOLLA R.Geometric loss functions for camera pose regression with deep learning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2017：5974-5983.
[37] SAPUTRA M，GUSMAO P D，WANG S，et al.Learning monocular visual odometry through geometry-aware curriculum learning[C]//2019 International Conference on Robotics and Automation（ICRA），2019.
[38] WALCH F，HAZIRBAS C，LEAL-TAIXE L，et al.Image-based localization using lstms for structured feature correlation[C]//Proceedings of the IEEE International Conference on Computer Vision，2017：627-637.
[39] WANG S，CLARK R，WEN H，et al.Deepvo：towards end-to-end visual odometry with deep recurrent convolutional neural networks[C]//2017 IEEE International Conference on Robotics and Automation（ICRA），2017：2043-2050.
[40] CLARK R，WANG S，MARKHAM A，et al.Vidloc：a deep spatio-temporal model for 6-dof video-clip relocalization[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2017：6856-6864.
[41] JIAO J，JIAO J C，MO Y，et al.Magicvo：end-to-end mono-
cular visual odometry through deep bi-directional recurrent convolutional neural network[J].arXiv：1811.10964，2018.
[42] COSTANTE G，MANCINI M.Uncertainty estimation for data-driven visual odometry[J].IEEE Transactions on Robotics，2020，36（6）：1738-1757.
[43] KENDALL A，CIPOLLA R.Modelling uncertainty in deep learning for camera relocalization[C]//IEEE International Conference on Robotics and Automation（ICRA），2016：4762-4769.
[44] KAYGUSUZ N，MENDEZ O，BOWDEN R.MDN-VO：estimating visual odometry with confidence[C]//2021 IEEE/RSJ International Conference on Intelligent Robots and Systems（IROS），2021：3528-3533.
[45] KOUMIS A S，PREISS J A，SUKHATME G S.Estimating metric scale visual odometry from videos using 3D convolutional networks[C]//2019 IEEE/RSJ International Conference on Intelligent Robots and Systems（IROS），2020.
[46] LIN Y，LIU Z，HUANG J，et al.Deep global-relative networks for end-to-end 6-dof visual localization and odometry[C]//Pacific Rim International Conference on Artificial Intelligence.Cham：Springer，2019：454-467.
[47] PERETROUKHIN V，CLEMENT L，KELLY J.Reducing drift in visual odometry by inferring sun direction using a Bayesian convolutional neural network[C]//2017 IEEE International Conference on Robotics and Automation（ICRA），2017：2035-2042.
[48] COSTANTE G，CIARFUGLIA T A.LS-VO：learning dense optical subspace for robust visual odometry estimation[J].IEEE Robotics and Automation Letters，2018，3（3）：1735-1742.
[49] HUANG Y，ZHAO B，GAO C，et al.Learning optical flow with R-CNN for visual odometry[C]//2021 IEEE International Conference on Robotics and Automation（ICRA），2021：14410-14416.
[50] XUE F，WANG X，WANG J，et al.Deep visual odometry with adaptive memory[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，2022，44（2）：940-954.
[51] KUO X Y，LIU C，LIN K C，et al.Dynamic attention-based visual odometry[C]//2020 IEEE/RSJ International Conference on Intelligent Robots and Systems（IROS），2020.
[52] XUE F，WANG Q，WANG X，et al.Guided feature selection for deep visual odometry[C]//Asian Conference on Computer Vision.Cham：Springer，2018：293-308.
[53] JADERBERG M，SIMONYAN K，ZISSERMAN A，et al.Spatial transformer networks[C]//Advances in Neural Information Processing Systems，2015：1329-1336.
[54] GARG R，VIJAYKUMAR B G，CARNEIRO G.Unsupervised CNN for single view depth estimation：geometry to the rescue[C]//14th European Conference on Computer Vision（ECCV），2016：740-756.
[55] ZHOU T，BROWN M，SNAVELY N，et al.Unsupervised learning of depth and ego-motion from video[C]//30th IEEE/CVF Conference on Computer Vision and Pattern Recognition（CVPR），2017：1122-1131.
[56] VIJAYANARASIMHAN S，RICCO S，SCHMID C，et al.Sfm-net：learning of structure and motion from video[J].arXiv：1704.07804，2017.
[57] WANG R，PIZER S M，FRAHM J M.Recurrent neural network for （un-）supervised learning of monocular video visual odometry and depth[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition（CVPR），2019.
[58] YANG N，STUMBERG L，WANG R，et al.D3vo：deep depth，deep pose and deep uncertainty for monocular visual odometry[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition，2020：1281-1292.
[59] KIM U H，KIM S H，KIM J H.Simvodis：simultaneous visual odometry，object detection，and instance segmentation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，2020，44（1）：428-441.
[60] ZHANG J，SUI W，WANG X，et al.Deep online correction for monocular visual odometry[C]//2021 IEEE International Conference on Robotics and Automation（ICRA），2021：14396-14402.
[61] LIANG Z，WANG Q，YU Y.Deep unsupervised learning based visual odometry with multi-scale matching and latent feature constraint[C]//2021 IEEE/RSJ International Conference on Intelligent Robots and Systems（IROS），2021：2239-2246.
[62] GODARD C，MAC AODHA O，BROSTOW G J.Unsuper vised monocular depth estimate with left-right consistency[C]//30th IEEE/CVF Conference on Computer Vision and Pattern Recognition（CVPR），2017：6602-6611.
[63] WANG Z，BOVIK A C，SHEIKH H R，et al.Image quality assessment：from error visibility to structural similarity[J].IEEE Transactions on Image Processing，2004，13（4）：600-612.
[64] ALMALIOGLU Y，SAPUTRA M，GUSMAO P，et al.GANVO：unsupervised deep monocular visual odometry and depth estimation with generative adversarial networks[C]//2019 International Conference on Robotics and Automation（ICRA），2019.
[65] GOODFELLOW I，POUGET-ABADIE J，MIRZA M，et al.Generative adversarial nets[C]//Advances in Neural Information Processing Systems，2014.
[66] LI S，XUE F，WANG X，et al.Sequential adversarial learning for self-supervised deep visual odometry[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision，2019：2851-2860.
[67] FENG T，GU D.SGANVO：unsupervised deep visual odometry and depth estimation with stacked generative adversarial networks[J].IEEE Robotics and Automation Letters，2019，4（4）：4431-4437.
[68] IYER G，KRISHNA MURTHY J，GUPTA G，et al.Geometric consistency for self-supervised end-to-end visual odometry[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops，2018：267-275.
[69] ZHAN H，GARG R，WEERASEKERA C S，et al.Unsupervised learning of monocular depth estimation and visual odometry with deep feature reconstruction[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2018：340-349.
[70] LI R，WANG S，LONG Z，et al.Undeepvo：monocular visual odometry through unsupervised deep learning[C]//2018 IEEE International Conference on Robotics and Automation（ICRA），2018：7286-7291.
[71] BABU V M，DAS K，MAJUMDAR A，et al.Undemon：unsupervised deep network for depth and ego-motion estimation[C]//2018 IEEE/RSJ International Conference on Intelligent Robots and Systems（IROS），2018：1082-1088.
[72] LIU Q，LI R，HU H，et al.Using unsupervised deep learning technique for monocular visual odometry[J].IEEE Access，2019，7：18076-18088.
[73] BIAN J W，ZHAN H，WANG N，et al.Unsupervised scale-consistent depth learning from video[J].International Journal of Computer Vision，2021，129（9）：2548-2564.
[74] ALMALIOGLU Y，SANTAMARIA-NAVARRO A，MORRELL B，et al.Unsupervised deep persistent monocular visual odome-
try and depth estimation in extreme environments[C]//2021 IEEE/RSJ International Conference on Intelligent Robots and Systems（IROS），2020：3534-3541.
[75] YIN Z，SHI J.Geonet：unsupervised learning of dense depth，optical flow and camera pose[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2018：1983-1992.
[76] ZOU Y，JI P，TRAN Q H，et al.Learning monocular visual odometry via self-supervised long-term modeling[C]//European Conference on Computer Vision.Cham：Springer，2020：710-727.
[77] XUE F，WANG X，LI S，et al.Beyond tracking：selecting memory and refining poses for deep visual odometry[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition，2019：8575-8583.
[78] LI S，WANG X，CAO Y，et al.Self-supervised deep visual odometry with online adaptation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition，2020：6339-6348.
[79] LI S，WU X，CAO Y，et al.Generalizing to the open world：deep visual odometry with online adaptation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition，2021：13184-13193.
[80] PRASAD V，DAS D，BHOWMICK B.Epipolar geometry based learning of multi-view depth and ego-motion from monocular sequences[C]//Proceedings of the 11th Indian Conference on Computer Vision，Graphics and Image Processing，2018：1-9.
[81] SHEN T，LUO Z，ZHOU L，et al.Beyond photometric loss for self-supervised ego-motion estimation[C]//2019 International Conference on Robotics and Automation（ICRA），2019：6359-6365.
[82] GAO X S，HOU X R，TANG J，et al.Complete solution classification for the perspective-three-point problem[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，2003，25（8）：930-943.
[83] RADWAN N，VALADA A，BURGARD W.Vlocnet++：deep multitask learning for semantic visual localization and odometry[J].IEEE Robotics and Automation Letters，2018，3（4）：4407-4414.
[84] LI Y，USHIKU Y，HARADA T.Pose graph optimization for unsupervised monocular visual odometry[C]//2019 International Conference on Robotics and Automation（ICRA），2019.
[85] BARNES D，MADDERN W，PASCOE G，et al.Driven to distraction：self-supervised distractor learning for robust monocular visual odometry in urban environments[C]//2018 IEEE International Conference on Robotics and Automation，2018：1894-1900.
[86] WAGSTAFF B，PERETROUKHIN V，KELLY J.Self-supervised deep pose corrections for robust visual odometry[C]//2020 IEEE International Conference on Robotics and Automation（ICRA），2020：2331-2337.
[87] GEIGER A，LENZ P，STILLER C，et al.Vision meets robotics：the kitti dataset[J].The International Journal of Robotics Research，2013，32（11）：1231-1237.
[88] BURRI M，NIKOLIC J，GOHL P，et al.The EuRoC micro aerial vehicle datasets[J].The International Journal of Robotics Research，2016，35（10）：1157-1163.
[89] STURM J，ENGELHARD N，ENDRES F，et al.A benchmark for the evaluation of RGB-D SLAM systems[C]//2012 IEEE/RSJ International Conference on Intelligent Robots and Systems，2012：573-580.
[90] CORDTS M，OMRAN M，RAMOS S，et al.The cityscapes dataset for semantic urban scene understanding[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2016：3213-3223.
[91] SHAMWELL E J，LINDGREN K，LEUNG S，et al.Unsupervised deep visual-inertial odometry with online error correction for RGB-D imagery[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，2019，42（10）：2478-2493.
[92] ESFAHANI M A，WANG H，WU K，et al.AbolDeepIO：a novel deep inertial odometry network for autonomous vehicles[J].IEEE Transactions on Intelligent Transportation Systems，2019，21（5）：1941-1950.