Multi-View 3D Model Reconstruction Based on Multi-Level Perception

doi:10.3778/j.issn.1002-8331.2108-0333

Abstract

Abstract: The multi-view 3D model reconstruction task based on voxel representation has the problem of discrete spatial information contained in the 2D view and sparse voxel distribution in the spatial grid. To solve the above problems, a multi-view 3D model reconstruction method based on multi-level perception is proposed. Through the multi-level perception of view-level, voxel-level and object-level information, a 3D model with complete structure and local details can be reconstructed. In the view feature extraction stage, the context-aware channel attention module is designed to maximize the potential spatial information in the 2D view. In the 3D model generation stage, the voxel-aware VoxFocal Loss is used to promote voxel generation in the spatial grid. In the 3D model refinement stage, the object-aware 3D discriminator is used to adaptively eliminate redundant voxels in the 3D model to make it more realistic. The effectiveness and advancement of this method have been verified on the large-scale synthetic dataset ShapeNet and the real-world dataset Pix3D.

Key words: 3D reconstruction, multi-level perception, voxel representation, attention module

摘要： 针对基于体素表征的多视图三维模型重建过程中，存在二维视图所包含的空间信息离散，空间网格中体素分布稀疏的问题，提出基于多层次感知的多视图三维模型重建方法，旨在通过对视图级、体素级与物体级信息的多层次感知，重建具有完整结构与局部细节的三维模型。在视图特征提取阶段设计了上下文感知的通道注意力模块来最大限度获取二维视图中潜在空间信息;在三维模型生成阶段，通过体素感知的VoxFocal Loss来促进空间网格中体素生成;在三维模型细化阶段，通过具有物体感知能力的三维判别器来自适应地消除三维模型中冗余体素，生成更具真实感的三维模型。在大型合成数据集ShapeNet和真实世界数据集Pix3D上验证了该方法的有效性与先进性。

关键词: 三维重建, 多层次感知, 体素表征, 注意力模块

BAI Jing, XU Hao. Multi-View 3D Model Reconstruction Based on Multi-Level Perception[J]. Computer Engineering and Applications, 2023, 59(2): 232-239.

白静, 徐昊. 多层次感知的多视图三维模型重建[J]. 计算机工程与应用, 2023, 59(2): 232-239.

References

[1] WITKIN A P.Recovering surface shape and orientation from texture[J].Artificial Intelligence，1981，17（1/2/3）：17-45.
[2] RICHTER S R，ROTH S.Discriminative shape from shading in uncalibrated illumination[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，Boston，2015：1128-1136.
[3] 王瑞胡.基于Shape from Shading的医学图像三维重建[J].计算机工程与应用，2008，44（6）：222-224.
WANG R H.3-D reconstruction of median image Shape from Shading based[J].Computer Engineering and Applications，2008，44（6）：222-224.
[4] 蒲建鑫，宋方伟，冷齐齐.基于SFM算法的三维重建关键技术研究[J].电子技术，2021，50（6）：36-37.
PU J X，SONG F W，LENG Q Q.Study on key technologies of 3D reconstruction based on SFM algorithm[J].Electronic Technology，2021，50（6）：36-37.
[5] MATURANA D，SCHERER S.Voxnet：a 3D convolutional neural network for real-time object recognition[C]//2015 IEEE/RSJ International Conference on Intelligent Robots and Systems，2015：922-928.
[6] WU Z，SONG S，KHOSLA A，et al.3D ShapeNets：a deep representation for volumetric shapes[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2015：1912-1920.
[7] CHOY C B，XU D，GWAK J，et al.3D-R2N2：a unified approach for single and multi-view 3D object reconstruction[C]//Proceedings of the European Conference on Computer Vision，2016：628-644.
[8] GIRDHAR R，FOUHEY D F，RODRIGUEZ M，et al.Learning a predictable and generative vector representation for objects[C]//European Conference on Computer Vision.Cham：Springer，2016：484-499.
[9] WU J，ZHANG C，XUE T，et al.Learning a probabilistic latent space of object shapes via 3d generative-adversarial modeling[C]//Proceedings of the 30th International Conference on Neural Information Processing Systems，2016：82-90.
[10] ZHU J，XIE J，FANG Y.Learning adversarial 3d model generation with 2d image enhancer[C]//Thirty-Second AAAI Conference on Artificial Intelligence，2018.
[11] WU J，WANG Y，XUE T，et al.MarrNet：3D shape reconstruction via 2.5 D sketches[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems，2017：540-550.
[12] ZHANG X，ZHANG Z，ZHANG C，et al.Learning to reconstruct shapes from unseen classes[C]//Proceedings of the 32nd International Conference on Neural Information Processing Systems，2018：2263-2274.
[13] WU J，ZHANG C，ZHANG X，et al.Learning shape priors for single-view 3d completion and reconstruction[C]//Proceedings of the European Conference on Computer Vision（ECCV），2018：646-662.
[14] XIE H，YAO H，SUN X，et al.Pix2vox：context-aware 3d reconstruction from single and multi-view images[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision，2019：2690-2698.
[15] XIE H，YAO H，ZHANG S，et al.Pix2Vox++：multi-scale context-aware 3D object reconstruction from single and multiple images[J].International Journal of Computer Vision，2020，128（12）：2919-2935.
[16] 胡飞，叶龙，钟微，等.基于注意力机制的单视角三维重建[J].中国传媒大学学报（自然科学版），2019，26（4）：24-30.
HU F，YE L，ZHONG W，et al.Attention based single-view 3D reconstruction[J].Journal of Communication University of China（Science and Technology），2019，26（4）：24-30.
[17] WALLACE B，HARIHARAN B.Few-shot generalization for single-image 3d reconstruction via priors[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision，2019：3818-3827.
[18] IOFFE S，SZEGEDY C.Batch normalization：accelerating deep network training by reducing internal covariate shift[C]//International Conference on Machine Learning，2015：448-456.
[19] GLOROT X，BORDES A，BENGIO Y.Deep sparse rectifier neural networks[C]//Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics，2011：315-323.

[20] LIN T Y，GOYAL P，GIRSHICK R，et al.Focal loss for dense object detection[C]//Proceedings of the IEEE International Conference on Computer Vision，2017：2980-2988.

[21] SUN X，WU J，ZHANG X，et al.Pix3d：dataset and methods for single- image 3d shape modeling[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2018：2974-2983.