Computer Engineering and Applications ›› 2021, Vol. 57 ›› Issue (14): 231-236.DOI: 10.3778/j.issn.1002-8331.2004-0432

Previous Articles     Next Articles

Monocular Image Depth Estimation Based on Fully Convolutional Encoder-Decoder Network

XIA Mengqi, HAO Kun, ZHAO Lu   

  1. School of Computer and Information Engineering, Tianjin Chengjian University, Tianjin 300384, China
  • Online:2021-07-15 Published:2021-07-14

基于全卷积编解码网络的单目图像深度估计

夏梦琪,郝琨,赵璐   

  1. 天津城建大学 计算机与信息工程学院,天津 300384

Abstract:

Aiming at the problems of low accuracy and slow speed in the depth estimation of monocular images by traditional methods, a fully convolutional encoding-decoding network model is proposed, which takes the sparse depth sample set and RGB image as input. The encoder layer is composed of Resnet and a convolution layer. The decoder layer is composed of four up-sampling layers and a bilinear up-sampling layer. The up-sampling layer uses the up-convolution module and the up-projection module to cross use, which effectively reduces the chessboard effect and retains the predicted depth. At the same time, full convolution is used in the model to reduce the parameters and improve the prediction speed. The validity and superiority of the network model are verified on the NYU-Depth-v2 dataset. The experimental results show that compared with the multi-scale convolution neural network, the accuracy of the model is improved by about 4% on [δ<1.25], and the RMSE error index is reduced by about 11%; compared with the RGB image alone, the RMSE error is reduced by about 26% when 100 spatial random depth samples are added.

Key words: monocular image depth estimation, convolution neural network, depth residual network, sparse depth measurement

摘要:

针对传统方法在单目图像深度估计时精度低、速度慢等问题,提出一种全卷积编码-解码网络模型,该模型将稀疏的深度样本集和RGB图像作为输入,编码层由Resnet和一个卷积层组成,解码层由两个上采样层和一个双线性上采样层组成,上采样层采用上卷积模块和上投影模块交叉使用,有效降低了棋盘效应并保留了预测深度图像的边缘信息。同时,模型中使用了全卷积,使得参数减少,提升了预测速度。在NYU-Depth-v2数据集上验证了网络模型的有效性与优越性。实验结果表明,在仅使用RGB图像进行深度预测的情况下,与多尺度卷积神经网络相比,该模型在精度[δ<1.25]上提高约4%,均方根误差指标降低约11%;与仅使用RGB图像相比,添加100个空间随机深度样本,均方根误差降低约26%。

关键词: 单目图像深度估计, 卷积神经网络, 深度残差网络, 稀疏深度测量