计算机工程与应用 ›› 2022, Vol. 58 ›› Issue (20): 16-27.DOI: 10.3778/j.issn.1002-8331.2204-0382

• 热点与综述 • 上一篇    下一篇

基于深度学习的立体影像视差估计方法综述

王道累,肖佳威,李建康,朱瑞   

  1. 上海电力大学 能源与机械工程学院,上海 200090
  • 出版日期:2022-10-15 发布日期:2022-10-15

Review of Stereo Image Disparity Estimation Methods Based on Depth Learning

WANG Daolei, XIAO Jiawei, LI Jiankang, ZHU Rui   

  1. College of Energy and Mechanical Engineering, Shanghai University of Electric Power, Shanghai 200090, China
  • Online:2022-10-15 Published:2022-10-15

摘要: 三维重建技术常用于自动驾驶、机器人、无人机和增强现实等领域。视差估计是三维重建的关键步骤,随着数据集的增加、硬件和网络模型的发展,深度学习视差估计模型被广泛使用并取得良好效果。然而,这些方法常用室外场景的物体,很少使用在室内场景的数据集中。回顾了双目视差估计的深度学习方法,选用5种深度学习网络:PSMNet(pyramid stereo matching network)、GA-Net(guided aggregation network)、LEAStereo(hierarchical neural architecture search for deep stereo matching)、DeepPruner(learning efficient stereo matching via differentiable patchmatch)、BGNet(bilateral grid learning for stereo matching networks),将其运用在一套真实世界的街景数据集(KITTI2015)和两套室内场景数据集(Middlebury2014、Instereo2K);分析各模型搭建方法,评估深度学习在室内场景影像视差估计中的性能,并与传统的SGM方法进行比较。针对深度学习视差估计方法的研究内容,指出其面临的问题及挑战。

关键词: 视差估计, 深度学习, 室内影像, 卷积神经网络

Abstract: 3D reconstruction technology is widely used in autonomous driving, robotics, drones and augmented reality, etc. Disparity estimation is a key step in 3D reconstruction. With the increase of datasets and the development of hardware and network models, deep learning disparity estimation models for disparity estimation are widely used and achieve good results. However, these methods often use objects in outdoor scenes and are rarely used in datasets for indoor scenes. The paper reviews deep learning methods for binocular disparity estimation, and selects five representative deep learning networks, such as PSMNet(pyramid stereo matching network), GA-Net(guided aggregation network), LEAStereo(hierarchical neural architecture search for deep stereo matching), DeepPruner(learning efficient stereo matching via differentiable patchmatch), BGNet(bilateral grid learning for stereo matching networks), and applies it to a real-world street view dataset (KITTI2015) and two indoor scene datasets (Middlebury2014, Instereo2K). Each model building method are analyzed. This paper evaluates the performance of deep learning in the disparity estimation of indoor scene images, and compares it with the traditional SGM method. Finally, according to the research content of the deep learning disparity estimation method, the problems and challenges it faces are pointed out.

Key words: disparity estimation, deep learning, indoor image, convolutional neural networks