计算机工程与应用 ›› 2025, Vol. 61 ›› Issue (5): 261-268.DOI: 10.3778/j.issn.1002-8331.2310-0301

• 图形图像处理 • 上一篇    下一篇

融合自适应采样与全局感知的图像深度估计算法

王国相,李昌隆,宋俊锋,叶振,金恒   

  1. 1.丽水学院,浙江 丽水 323000
    2.浙江省特色文创产品数字化设计与智能制造重点实验室,浙江 丽水 323000
    3.北京邮电大学 信息与通信工程学院,北京 100876
    4.浙江大华技术股份有限公司,杭州 310051
  • 出版日期:2025-03-01 发布日期:2025-03-01

Image Depth Estimation Algorithm Incorporating Adaptive Sampling and Context-Aware Module

WANG Guoxiang, LI Changlong, SONG Junfeng, YE Zhen, JIN Heng   

  1. 1.Lishui University, Lishui, Zhejiang 323000, China
    2.Key Laboratory of Digital Design and Intelligent Manufacture in Culture and Creativity Product of Zhejiang Province, Lishui, Zhejiang 323000, China
    3.School of Information and Communication Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, China
    4.Zhejiang Dahua Technology Company Limited, Hangzhou 310051, China
  • Online:2025-03-01 Published:2025-03-01

摘要: 深度估计旨在通过少量稀疏深度样本点预测场景的稠密深度图,现有方法通常直接从稀疏深度样本生成最终的深度预测图,没有充分挖掘稀疏深度图包含的几何信息,导致深度估计算法的预测精度不够高。针对上述问题,提出一种融合自适应采样与全局感知的图像深度估计算法,由粗粒度到细粒度逐步预测深度图。通过引入预训练的深度补全网络预测粗粒度的稠密深度图,获取丰富的场景结构信息和语义信息。设计自适应深度采样方法,引导算法模型对远处的区域施加更多关注,缓解深度数据的长尾分布问题。同时通过新设计的全局感知模块,捕获并融合多尺度特征,从而获取更多的场景上下文信息。在NYU-Depth-v2数据集上的实验结果表明,算法在整体性能上超越了其他方法;消融实验的结果验证了提出的各个模块的有效性;Zero-shot实验的结果表明算法有较好的泛化性能,其中在ScanNet数据集上的阈值精度指标δ<1.25相比P3D方法提升了42个百分点,相比S2D方法则提升了3.8个百分点。

关键词: 深度估计, 深度补全, 稠密深度图, 多尺度特征融合, 自适应采样

Abstract: Depth estimation aims to predict dense depth maps of the scene from a few sparse depth samples. Existing works directly generate the final depth prediction but not sufficiently exploit the geometric information in sparse depth maps, which results in the prediction accuracy of the depth estimation algorithm not being high enough. To solve this problem, an image depth estimation algorithm incorporating adaptive sampling and context-aware module is proposed to progressively predict depth maps from coarse-level to fine-level. Firstly, a pre-trained depth completion network is introduced to predict coarse-level dense depth maps and obtain rich scene structures and semantic information. Then, the adaptive sampling is designed to guide the model to pay more attention to distant regions which can alleviate the long-tail problem of depth data. Meanwhile, the newly designed context-aware module captures and fuses multi-scale features to obtain more context information of the scene. Experimental results on NYU-Depth-v2 dataset show that the heuristic depth estimation network surpasses compared with methods in several indicators. Results of ablation study demonstrate the effectiveness of the proposed modules. Zero-shot experiments verify the generalization ability of the proposed algorithm, and the accuracy indicator δ<1.25 improves 42 percentage points over P3D and 3.8 percentage points  over S2D, respectively.

Key words: depth estimation, depth completion, dense depth map, multi-scale feature fusion, adaptive sampling