Survey of Neural Radiance Fields for Multi-View Synthesis Technologies

doi:10.3778/j.issn.1002-8331.2303-0218

Abstract

Abstract: Rendering realistic virtual scenes from images has been a long-standing research goal in the fields of computer graphics and computer vision. NeRF (neural radiance fields) is an emerging method based on deep neural networks, which achieves realistic rendering by learning the radiance field of each point in the scene. By using neural radiance fields, not only realistic images but also realistic three-dimensional scenes can be generated, making it have a wide range of application prospects such as virtual reality, augmented reality and computer games. However, its basic model has shortcomings such as low training efficiency, poor generalization ability, insufficient interpretability, susceptible to lighting and material changes, inability to handle dynamic scenes, and other deficiencies that may result in suboptimal rendering results in certain situations. With the continuous popularity of this field, a large amount of research has been carried out, yielding impressive results in terms of efficiency and accuracy. In order to track the latest research in this field, this paper provides a review and summary of the key algorithms in recent years. This paper first outlines the background and principles of neural radiance fields, and briefly introduces the evaluation metrics and public datasets in this field. Then, a classification discussion is conducted on the key improvements to the model, mainly including: the optimization of basic NeRF model parameters, the improvement in rendering speed and inference ability, the enhancement of spatial representation and lighting ability, the improvement in camera pose and sparse view synthesis methods for static scene, and the development in dynamic scene modeling field. Subsequently, the speed and performance of various models are classified, compared and analyzed, and the main model evaluation indicators and open datasets in this field are briefly introduced. Finally, the future development trend of neural radiance field is prospected.

Key words: neural radiance fields (NeRF), view synthesis, neural rendering, scene representation, deep learning, 3D reconstruction

摘要： 如何从图像中渲染出较为真实的虚拟场景一直是计算机图形学与计算机视觉领域的研究目标之一。神经辐射场是一种基于深度神经网络的新兴方法，它通过学习场景中每个点的辐射场来实现较为真实的渲染效果。通过神经辐射场不仅可以生成逼真的图像，而且可以生成具有真实感的三维场景，在虚拟现实、增强现实和计算机游戏等领域有着广泛的应用前景。然而，其基础模型存在训练效率低、泛化能力差、可解释性不足、易受光照和材质变化影响以及无法处理动态场景等问题，在某些情况下无法获得最佳的渲染结果。大量基于此研究的工作陆续展开，且在效率和精度等方面都取得了出色的成果。为了跟踪该领域最新研究成果，对近年来神经辐射场领域的关键算法进行回顾和综述。首先介绍了神经辐射场的产生背景及原理，对后续关键改进模型进行分类探讨。主要涵盖以下几个方面：对神经辐射场基本模型参数的优化，在渲染速度与推理能力方面的提升，对空间表达和光照能力的改善，针对静态场景相机位姿估计与稀疏视图合成方法的改进，以及在动态场景建模领域的发展。对各种模型的速度与性能进行分类对比与分析，并简要介绍了该领域主要模型评估指标与公开数据集。最后对神经辐射场未来发展趋势进行展望。

关键词: 神经辐射场（NeRF）, 视图合成, 神经渲染, 场景表达, 深度学习, 三维重建

MA Hansheng, ZHU Yuhua, LI Zhihui, YAN Lei, SI Yiyi, LIAN Yimeng, ZHANG Yuhan. Survey of Neural Radiance Fields for Multi-View Synthesis Technologies[J]. Computer Engineering and Applications, 2024, 60(4): 21-38.

马汉声, 祝玉华, 李智慧, 阎磊, 司艺艺, 连一萌, 张钰涵. 神经辐射场多视图合成技术综述[J]. 计算机工程与应用, 2024, 60(4): 21-38.

References

[1] PENG L W, SHAMSUDDIN S M. 3D object reconstruction and representation using neural networks[C]//Proceedings of the 2nd International Conference on Computer Graphics and Interactive Techniques in Australasia and South East Asia, 2004: 139-147.
[2] REN P, WANG J, GONG M, et al. Global illumination with radiance regression functions[J]. ACM Transactions on Graphics, 2013, 32(4): 130.
[3] BLANZ V, BASSO C, POGGIO T, et al. Reanimating faces in images and video[J]. Computer Graphics Forum, 2003, 22(3): 641-650.
[4] BROWN M, LOWE D G. Recognising panoramas[C]//Proceedings of the 9th IEEE International Conference on Computer Vision, 2003: 1218-1227.
[5] BARRON J T, MALIK J. Shape, illumination, and reflectance from shading[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(8): 1670-1687.
[6] BROCK A, LIM T, RITCHIE J M, et al. Neural photo editing with introspective adversarial networks[C]//Proceedings of the 2017 International Conference on Learning Representations, 2017.
[7] TEWARI A, FRIED O, THIES J, et al. State of the art on neural rendering[J]. Computer Graphics, 2020, 39(2): 701-727.
[8] DELLAERT F, YEN CHEN L. Neural volume rendering: NeRF and beyond[J]. arXiv: 2101. 05204, 2020.
[9] LEVOY M, HANRAHAN P. Light field rendering[C]//Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques, 1996: 31-42.
[10] BUEHLER C, BOSSE M, MCMILLAN L, et al. Unstructured lumigraph rendering[C]//Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques, 2001: 425-432.
[11] GUO K, LINCOLN P, DAVIDSON P, et al. The relightables: volumetric performance capture of humans with realistic relighting[J]. ACM Transactions on Graphics, 2019, 38(6): 1-19.
[12] LOMBARDI S, SIMON T, SARAGIH J, et al. Neural volumes: learning dynamic renderable volumes from images[J]. ACM Transactions on Graphics, 2019, 38(4): 1-14.
[13] XIE Y, TAKIKAWA T, SAITO S, et al. Neural fields in visual computing and beyond[J]. Computer Graphics Forum, 2022, 41(2): 641-676.
[14] MESCHEDER L, OECHSLE M, NIEMEYER M, et al. Occupancy networks: learning 3D reconstruction in function space[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 4455-4465.
[15] CHEN Z, ZHANG H. Learning implicit fields for generative shape modeling[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 5932-5941.
[16] PARK J J, FLORENCE P, STRAUB J, et al. DeepSDF: learning continuous signed distance functions for shape representation[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 165-174.
[17] SAITO S, HUANG Z, NATSUME R, et al. PIFu: pixel-aligned implicit function for high-resolution clothed human digitization[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2020.
[18] SITZMANN V, ZOLLH?FER M, WETZSTEIN G. Scene representation networks: continuous 3D-structure-aware neural scene representations[C]//Advances in Neural Information Processing Systems 32, 2019.
[19] NIEMEYER M, MESCHEDER L, OECHSLE M, et al. Differentiable volumetric rendering: learning implicit 3D representations without 3D supervision[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 3501-3512.
[20] YARIV L, KASTEN Y, MORAN D, et al. Multiview neural surface reconstruction by disentangling geometry and appearance[C]//Advances in Neural Information Processing Systems 33, 2020: 2492-2502.
[21] MILDENHALL B, SRINIVASAN P P, TANCIK M, et al. NeRF: representing scenes as neural radiance fields for view synthesis[J]. Communications of the ACM, 2021, 65(1): 99-106.
[22] KAJIYA J T, VON HERZEN B P. Ray tracing volume densities[J]. ACM SIGGRAPH Computer Graphics, 1984, 18(3): 165-174.
[23] TEWARI A, THIES J, MILDENHALL B, et al. Advances in neural rendering[J]. Computer Graphics Forum, 2022, 41(2): 703-735.
[24] BARRON J T, MILDENHALL B, TANCIK M, et al. Mip-NeRF: a multiscale representation for anti-aliasing neural radiance fields[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021: 5855-5864.
[25] VERBIN D, HEDMAN P, MILDENHALL B, et al. Ref-NeRF: structured view-dependent appearance for neural radiance fields[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 5481-5490.
[26] ZHANG J, ZHANG Y, FU H, et al. Ray priors through reprojection: improving neural radiance fields for novel view extrapolation[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 18355-18365.
[27] DENG K, LIU A, ZHU J Y, et al. Depth-supervised NeRF: fewer views and faster training for free[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 12882-12891.
[28] WEI Y, LIU S, RAO Y, et al. NerfingMVS: guided optimization of neural radiance fields for indoor multi-view stereo[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021: 5590-5599.
[29] ROESSLE B, BARRON J T, MILDENHALL B, et al. dense depth priors for neural radiance fields from sparse input views[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 12882-12891.
[30] XU Q, XU Z, PHILIP J, et al. Point-NeRF: point-based neural radiance fields[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 5428-5438.
[31] DENG B, BARRON J T, SRINIVASAN P. JaxNeRF: an efficient JAX implementation of NeRF[EB/OL]. (2020)[2022-12-29]. https://github.com/google-research/google-research.
[32] HEDMAN P, SRINIVASAN P P, MILDENHALL B, et al. Baking neural radiance fields for real-time view synthesis[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021: 5855-5864.
[33] GARBIN S J, KOWALSKI M, JOHNSON M, et al. FastNeRF: high-fidelity neural rendering at 200fps[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021: 14346-14355.
[34] YU A, LI R, TANCIK M, et al. PlenOctrees for real-time rendering of neural radiance fields[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021: 5752-5761.
[35] LIU L, GU J, ZAW LIN K, et al. Neural sparse voxel fields[C]//Advances in Neural Information Processing Systems 33, 2020: 15651-15663.
[36] WU L, LEE J Y, BHATTAD A, et al. DIVeR: real-time and accurate neural radiance fields with deterministic integration for volume rendering[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 16200-16209.
[37] MüLLER T, EVANS A, SCHIED C, et al. Instant neural graphics primitives with a multiresolution hash encoding[J]. ACM Transactions on Graphics, 2022, 41(4): 102.
[38] BI S, XU Z, SRINIVASAN P, et al. Neural reflectance fields for appearance acquisition[J]. arXiv: 2008. 03824, 2020.
[39] SRINIVASAN P P, DENG B, ZHANG X, et al. NeRV: neural reflectance and visibility fields for relighting and view synthesis[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 7495-7504.
[40] ZHANG X, SRINIVASAN P P, DENG B, et al. NeRFactor: neural factorization of shape and reflectance under an unknown illumination[J]. ACM Transactions on Graphics, 2021, 40(6): 237.
[41] ESPOSITO S, BAIERI D, ZELLMANN S, et al. KiloNeuS: implicit neural representations with real-time global illumination[J]. arXiv: 2206. 10885, 2022.
[42] MARTIN-BRUALLA R, RADWAN N, SAJJADI M S M, et al. NeRF in the wild: neural radiance fields for unconstrained photo collections[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 7210-7219.
[43] NIEMEYER M, GEIGER A. GIRAFFE: representing scenes as compositional generative neural feature fields[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 11448-11459.
[44] ZHANG K, RIEGLER G, SNAVELY N, et al. NeRF++: analyzing and improving neural radiance fields[J]. arXiv: 2010. 07492, 2020.
[45] TURKI H, RAMANAN D, SATYANARAYANAN M. Mega-NeRF: scalable construction of large-scale nerfs for virtual fly-throughs[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 12922-12931.
[46] XIANGLI Y, XU L, PAN X, et al. BungeeNeRF: progressive neural radiance field for extreme multi-scale scene rendering[C]//Proceedings of the 17th European Conference on Computer Vision, Tel Aviv, Oct 23-27, 2022. Cham: Springer, 2022: 106-122.
[47] RUDNEV V, ELGHARIB M, SMITH W, et al. NeRF for outdoor scene relighting[C]//Proceedings of the 17th European Conference on Computer Vision, Tel Aviv, Oct 23-27, 2022. Cham: Springer, 2022: 615-631.
[48] TANCIK M, CASSER V, YAN X, et al. Block-NeRF: scalable large scene neural view synthesis[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 8248-8258.
[49] PARK K, SINHA U, BARRON J T, et al. Nerfies: deformable neural radiance fields[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021: 5865-5874.
[50] PARK K, SINHA U, HEDMAN P, et al. HyperNeRF: a higher-dimensional representation for topologically varying neural radiance fields[J]. ACM Transactions on Graphics, 2021, 40(6): 238.
[51] XIAN W, HUANG J B, KOPF J, et al. Space-time neural irradiance fields for free-viewpoint video[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 9421-9431.
[52] TRETSCHK E, TEWARI A, GOLYANIK V, et al. Non-rigid neural radiance fields: reconstruction and novel view synthesis of a dynamic scene from monocular video[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021: 12939-12950.
[53] LI Z, NIKLAUS S, SNAVELY N, et al. Neural scene flow fields for space-time view synthesis of dynamic scenes[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 6494-6504.
[54] PUMAROLA A, CORONA E, PONS-MOLL G, et al. D-NeRF: neural radiance fields for dynamic scenes[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 10318-10327.
[55] KUNDU A, GENOVA K, YIN X, et al. Panoptic neural fields: a semantic object-aware neural scene representation[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 12861-12871.
[56] ATTAL B, LAIDLAW E, GOKASLAN A, et al. T?rf: time-of-flight radiance fields for dynamic scene view synthesis[C]//Advances in Neural Information Processing Systems 34, 2021: 26289-26301.
[57] LUO X, HUANG J B, SZELISKI R, et al. Consistent video depth estimation[J]. ACM Transactions on Graphics, 2020, 39(4): 71.
[58] NIEMEYER M, BARRON J T, MILDENHALL B, et al. RegNeRF: regularizing neural radiance fields for view synthesis from sparse inputs[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 5480-5490.
[59] CHEN A, XU Z, ZHAO F, et al. MVSNeRF: fast generalizable radiance field reconstruction from multi-view stereo[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021: 14104-14113.
[60] YU A, YE V, TANCIK M, et al. pixelNeRF: neural radiance fields from one or few images[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 4576-4585.
[61] JAIN A, TANCIK M, ABBEEL P. Putting NeRF on a Diet: semantically consistent few-shot view synthesis[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021: 5865-5874.
[62] YEN-CHEN L, FLORENCE P, BARRON J T, et al. iNeRF: inverting neural radiance fields for pose estimation[C]//Proceedings of the 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2021: 1323-1330.
[63] CHEN Y, CHEN X, WANG X, et al. Local-to-global registration for bundle-adjusting neural radiance fields[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023: 8264-8273.
[64] ROVEDA L, MARONI M, MAZZUCHELLI L, et al. Robot end-effector mounted camera pose optimization in object detection-based tasks[J]. Journal of Intelligent & Robotic Systems, 2021, 104(1): 16.
[65] SHAVIT Y, FERENS R. Introduction to camera pose estimation with deep learning[J]. arXiv: 1907. 05272, 2019.
[66] SUCAR E, LIU S, ORTIZ J, et al. iMAP: implicit mapping and positioning in real-time[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021: 6229-6238.
[67] ZHU Z, PENG S, LARSSON V, et al. NICE-SLAM: neural implicit scalable encoding for SLAM[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 12786-12796.
[68] KORHONEN J, YOU J. Peak signal-to-noise ratio revisited: is simple beautiful?[C]//Proceedings of the 2012 4th International Workshop on Quality of Multimedia Experience, 2012: 37-38.
[69] WANG Z, BOVIK A C, SHEIKH H R, et al. Image quality assessment: from error visibility to structural similarity[J]. IEEE Transactions on Image Processing, 2004, 13(4): 600-612.
[70] ZHANG R, ISOLA P, EFROS A A, et al. The unreasonable effectiveness of deep features as a perceptual metric[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018: 586-595.
[71] WANG Z, SIMONCELLI E P, BOVIK A C. Multiscale structural similarity for image quality assessment[C]//Proceedings of the 37th Asilomar Conference on Signals, Systems & Computers, 2003, 2: 1398-1402.
[72] STURM J, ENGELHARD N, ENDRES F, et al. A benchmark for the evaluation of RGB-D SLAM systems[C]//Proceedings of the 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2012: 573-580.
[73] SCHONBERGER J L, FRAHM J M. Structure-from-motion revisited[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016: 4104-4113.
[74] SITZMANN V, THIES J, HEIDE F, et al. DeepVoxels: learning persistent 3D feature embeddings[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 2432-2441.
[75] JENSEN R, DAHL A, VOGIATZIS G, et al. Large scale multi-view stereopsis evaluation[C]//Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014: 406-413.
[76] YAO Y, LUO Z, LI S, et al. BlendedMVS: a large-scale dataset for generalized multi-view stereo networks[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 1790-1799.
[77] MILDENHALL B, SRINIVASAN P P, ORTIZ-CAYON R, et al. Local light field fusion: practical view synthesis with prescriptive sampling guidelines[J]. ACM Transactions on Graphics, 2019, 38(4): 29.
[78] DAI A, CHANG A X, SAVVA M, et al. ScanNet: richly-annotated 3D reconstructions of indoor scenes[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017: 5828-5839.
[79] CHANG A X, FUNKHOUSER T, GUIBAS L, et al. ShapeNet: an information-rich 3D model repository[J]. arXiv: 1512. 03012, 2015.
[80] KNAPITSCH A, PARK J, ZHOU Q Y, et al. Tanks and temples: benchmarking large-scale scene reconstruction[J]. ACM Transactions on Graphics, 2017, 36(4): 78.
[81] JAIN A, MILDENHALL B, BARRON J T, et al. Zero-shot text-guided object generation with dream fields[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 867-876.
[82] SPEZIALETTI R, STELLA F, MARCON M, et al. Learning to orient surfaces by self-supervised spherical cnns[C]//Advances in Neural information processing systems 33, 2020: 5381-5392.
[83] SAJNANI R, POULENARD A, JAIN J, et al. ConDor: self-supervised canonicalization of 3D pose for partial shapes[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 16948-16958.