计算机工程与应用 ›› 2022, Vol. 58 ›› Issue (16): 139-146.DOI: 10.3778/j.issn.1002-8331.2101-0134

• 模式识别与人工智能 • 上一篇    下一篇

轻量级自适应上采样立体匹配

宋嘉菲,张浩东   

  1. 1.中国科学院 上海微系统与信息技术研究所 仿生视觉系统实验室,上海 200050
    2.上海科技大学 信息科学与技术学院,上海 201210
    3.中国科学院大学,北京 100049
  • 出版日期:2022-08-15 发布日期:2022-08-15

Lightweight Adaptive Upsampling Module for Stereo Matching

SONG Jiafei, ZHANG Haodong   

  1. 1.Bionic Vision System Laboratory, Shanghai Institute of Microsystem and Information Technology, Chinese Academy of Sciences, Shanghai 200050, China
    2.School of Information Science and Technology, ShanghaiTech University, Shanghai 201210, China
    3.University of Chinese Academy of Sciences, Beijing 100049, China
  • Online:2022-08-15 Published:2022-08-15

摘要: 针对现有立体匹配深度学习模型中常采用线性插值进行代价体上采样,而无法充分利用邻域纹理信息的问题,提出了一个自适应上采样模块。该模块首先为高分辨率输出中每一个像素位置自适应学习采样的权重窗口,然后采用最近邻方法将低分辨率输入上采样后在对应位置使用学习到的权重卷积得到最终对应高分辨输出的值。该模块具有三个特点:(1)大感受野,通过堆叠的空洞卷积以及多尺度窗口提高像素的邻域纹理感知能力;(2)轻量级,与线性插值相比,不需增加过多计算量;(3)通用性,可以移植到现有网络,替换其插值方法。在数据集SceneFlow、KITTI2015上的实验表明,通过采用所提模块替换PSMNet和AANet中的三线性插值,可以有效地降低各自的误差26.4%、10.3%(SceneFlow)和15.4%、18.9%(KITTI2015)。

关键词: 深度学习, 立体匹配, 代价体, 上采样, 轻量级

Abstract: Most deep learning based stereo matching networks upsample the cost volume by using the interpolation methods. Aiming at solving the drawbacks of such methods which cannot fully aggregate the context information, a lightweight adaptive upsampling module(LAUM) is proposed. LAUM first learns an adaptive weight window for each pixel in high-resolution feature map, and then convolves such weights with the feature map upsampled from low-resolution by using nearest interpolation method. LAUM has several appealing properties:(1) It applies stacked dilation convolution modules and multi-scale windows to enhance the receptive field; (2) It is a lightweight module, which can increase the accuracy without large computation compared with linear interpolation; (3) It can be assembled to each network easily. LAUM shows remarkable result after assembled to PSMNet and AANet, which reduces the error by 26.4%, 10.3% (SceneFlow) and 15.4%, 18.9% (KITTI2015).

Key words: deep learning, stereo matching, cost volume, upsampling, lightweight