轻量级自适应上采样立体匹配

doi:10.3778/j.issn.1002-8331.2101-0134

摘要/Abstract

摘要： 针对现有立体匹配深度学习模型中常采用线性插值进行代价体上采样，而无法充分利用邻域纹理信息的问题，提出了一个自适应上采样模块。该模块首先为高分辨率输出中每一个像素位置自适应学习采样的权重窗口，然后采用最近邻方法将低分辨率输入上采样后在对应位置使用学习到的权重卷积得到最终对应高分辨输出的值。该模块具有三个特点：（1）大感受野，通过堆叠的空洞卷积以及多尺度窗口提高像素的邻域纹理感知能力；（2）轻量级，与线性插值相比，不需增加过多计算量；（3）通用性，可以移植到现有网络，替换其插值方法。在数据集SceneFlow、KITTI2015上的实验表明，通过采用所提模块替换PSMNet和AANet中的三线性插值，可以有效地降低各自的误差26.4%、10.3%（SceneFlow）和15.4%、18.9%（KITTI2015）。

关键词: 深度学习, 立体匹配, 代价体, 上采样, 轻量级

Abstract: Most deep learning based stereo matching networks upsample the cost volume by using the interpolation methods. Aiming at solving the drawbacks of such methods which cannot fully aggregate the context information, a lightweight adaptive upsampling module（LAUM） is proposed. LAUM first learns an adaptive weight window for each pixel in high-resolution feature map, and then convolves such weights with the feature map upsampled from low-resolution by using nearest interpolation method. LAUM has several appealing properties：（1） It applies stacked dilation convolution modules and multi-scale windows to enhance the receptive field; （2） It is a lightweight module, which can increase the accuracy without large computation compared with linear interpolation; （3） It can be assembled to each network easily. LAUM shows remarkable result after assembled to PSMNet and AANet, which reduces the error by 26.4%, 10.3% （SceneFlow） and 15.4%, 18.9% （KITTI2015）.

Key words: deep learning, stereo matching, cost volume, upsampling, lightweight

宋嘉菲, 张浩东. 轻量级自适应上采样立体匹配[J]. 计算机工程与应用, 2022, 58(16): 139-146.

SONG Jiafei, ZHANG Haodong. Lightweight Adaptive Upsampling Module for Stereo Matching[J]. Computer Engineering and Applications, 2022, 58(16): 139-146.

参考文献

[1] ŽBONTAR J，LECUN Y.Stereo matching by training a convolutional neural network to compare image patches[J].The Journal of Machine Learning Research，2016，17（1）：2287-2318.
[2] MEI X，SUN X，ZHOU M，et al.On building an accurate stereo matching system on graphics hardware[C]//Proceedings of the 2011 IEEE International Conference on Computer Vision Workshops，2011：467-474.
[3] 马利，李晶皎，马技.邻域相关信息的改进Census变换立体匹配算法[J].计算机工程与应用，2014，50（24）：16-20.
MA L，LI J J，MA J.Modified Census transform with related information of neighborhood for stereo matching algorithm[J].Computer Engineering and Applications，2014，50（24）：16-20.
[4] 郭倩，张福杨，孙农亮.融合多特征表示和超像素优化的双目立体匹配[J].计算机工程与应用，2020，56（1）：216-223.
GUO Q，ZHANG F Y，SUN N L.Binocular stereo matching with multi-feature representation and super-pixel optimization[J].Computer Engineering and Applications，2020，56（1）：216-223.
[5] KENDALL A，MARTIROSYAN H，DASGUPTA S，et al.End-to-end learning of geometry and context for deep stereo regression[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision，2017：66-75.
[6] CHANG J R，CHEN Y S.Pyramid stereo matching network[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition，2018：5410-5418.
[7] YANG G，MANELA J，HAPPOLD M，et al.Hierarchical deep stereo matching on high-resolution images[C]//Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition，2019：5515-5524.
[8] KHAMIS S，FANELLO S，RHEMANN C，et al.StereoNet：guided hierarchical refinement for real-time edge-aware depth prediction[C]//Proceedings of the 15th European Conference on Computer Vision，2018：573-590.
[9] XU H，ZHANG J.AANet：adaptive aggregation network for efficient stereo matching[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition，2020：1959-1968.
[10] CHABRA R，STRAUB J，SWEENEY C，et al.StereoDRNet：dilated residual stereo net[C]//Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition，2019：11786-11795.
[11] ZEILER M D，KRISHNAN D，TAYLOR G W，et al.Deconvolutional networks[C]//Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition，2010：2528-2535.
[12] UMEHARA K，OTA J，ISHIMARU N，et al.Super-resolution convolutional neural network for the improvement of the image quality of magnified images in chest radiographs[C]//Medical Imaging 2017：Image Processing.International Society for Optics and Photonics，2017：10133.
[13] SHI W，CABALLERO J，HUSZáR F，et al.Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition，2016：1874-1883.
[14] JO Y，WUG OH S，KANG J，et al.Deep video super-resolution network using dynamic upsampling filters without explicit motion compensation[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition，2018：3224-3232.
[15] HE K，ZHANG X，REN S，et al.Deep residual learning for image recognition[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition，2016：770-778.
[16] YANG G，ZHAO H，SHI J，et al.SegStereo：exploiting semantic information for disparity estimation[C]//Proceedings of the 15th European Conference on Computer Vision，2018：636-651.
[17] PANG J，SUN W，REN J S J，et al.Cascade residual learning：a two-stage convolutional neural network for stereo matching[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision Workshops，2017：887-895.
[18] LIANG Z，FENG Y，GUO Y，et al.Learning for disparity estimation through feature constancy[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition，2018：2811-2820.
[19] NIE G Y，CHENG M M，LIU Y，et al.Multi-level context ultra-aggregation for stereo matching[C]//Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition，2019：3283-3291.
[20] TULYAKOV S，IVANOV A，FLEURET F.Practical deep stereo（PDS）：toward applications-friendly deep stereo matching[C]//Advances in Neural Information Processing Systems 31，2018：5871-5881.
[21] MAYER N，ILG E，HAUSSER P，et al.A large dataset to train convolutional networks for disparity，optical flow，and scene flow estimation[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition，2016：4040-4048.
[22] MENZE M，GEIGER A.Object scene flow for autonomous vehicles[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition，2015：3061-3070.
[23] KINGMA D P，BA J.Adam：a method for stochastic optimization[J].arXiv：1412.6980，2014.
[24] GUO X，YANG K，YANG W，et al.Group-wise correlation stereo network[C]//Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition，2019：3273-3282.
[25] DUGGAL S，WANG S，MA W C，et al.DeepPruner：learning efficient stereo matching via differentiable patchmatch[C]//Proceedings of the 2019 IEEE International Conference on Computer Vision，2019：4384-4393.
[26] ZHANG F，PRISACARIU V，YANG R，et al.GA-Net：guided aggregation net for end-to-end stereo matching[C]//Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition，2019：185-194.