Computer Engineering and Applications ›› 2021, Vol. 57 ›› Issue (6): 176-183.DOI: 10.3778/j.issn.1002-8331.2001-0019

Previous Articles     Next Articles

Monocular Depth Estimation in Outdoor Scene with Generative Adversarial Network

ZOU Chengming, HU Youpu   

  1. 1.School of Computer Science and Technology, Wuhan University of Technology, Wuhan 430000, China
    2.Hubei Key Laboratory of Transportation Internet of Things, Wuhan 430000, China
    3.Peng Cheng Laboratory, Shenzhen, Guangdong 518055, China
  • Online:2021-03-15 Published:2021-03-12

引入生成对抗网络的室外场景单目深度估计

邹承明,胡佑璞   

  1. 1.武汉理工大学 计算机科学与技术学院,武汉 430000
    2.交通物联网技术湖北省重点实验室,武汉 430000
    3.鹏城实验室,广东 深圳 518055

Abstract:

The Generative Adversarial Network(GAN) has a low accuracy rate in the depth estimation task in outdoor scenes, it is inaccurate for object boundary judgment. Focusing on this problem, this paper proposes a monocular depth estimation algorithm based on Cycle Generation Adversarial Network(CycleGAN). The algorithm splits the process of mapping a single image to a depth image into two sub-stages. In the first stage, the network learns the basic spatial characteristics of the image to obtain a depthmap at a coarse scale. On the basis of the former, the second stage optimizes the depthmap by comparing the differences in details to obtain a depthmap at a fine scale. In order to further improve the accuracy of depth estimation, the L1 distance is introduced into the loss function, so that the network can learn the pixel-to-pixel mapping relationship and avoid large deviations and distortions. Experimental results on the public outdoor scene dataset Make3D show that, compared with similar algorithms, this algorithm achieve better results in average relative error and root mean square error.

Key words: depth estimation, Generative Adversarial Network(GAN), image conversion, semi-supervised learning, deep learning

摘要:

生成对抗网络(GAN)算法在室外场景的深度估计任务中准确率较低,对于物体边界判断不准确。针对该问题,提出基于循环生成对抗网络(CycleGAN)的单目深度估计算法,将单幅图像映射到深度图像的过程拆分为两个子阶段。第一阶段中,网络学习图像的基本空间特征,得到粗糙尺度下的深度图像;第二阶段在前者的基础上,通过细节上的差异对比,优化深度图像,得到精细尺度下的深度图像。为了进一步提高深度估计的精度,在损失函数中引入了L1距离,让网络可以学习像素到像素的映射关系,避免出现较大的偏差与失真。在公开的室外场景数据集Make3D上的实验结果表明,与同类型算法相比,该算法的平均相对误差、均方根误差取得更好的效果。

关键词: 深度估计, 生成对抗网络, 图像转换, 半监督学习, 深度学习