计算机工程与应用 ›› 2024, Vol. 60 ›› Issue (4): 21-38.DOI: 10.3778/j.issn.1002-8331.2303-0218
马汉声,祝玉华,李智慧,阎磊,司艺艺,连一萌,张钰涵
出版日期:
2024-02-15
发布日期:
2024-02-15
MA Hansheng, ZHU Yuhua, LI Zhihui, YAN Lei, SI Yiyi, LIAN Yimeng, ZHANG Yuhan
Online:
2024-02-15
Published:
2024-02-15
摘要: 如何从图像中渲染出较为真实的虚拟场景一直是计算机图形学与计算机视觉领域的研究目标之一。神经辐射场是一种基于深度神经网络的新兴方法,它通过学习场景中每个点的辐射场来实现较为真实的渲染效果。通过神经辐射场不仅可以生成逼真的图像,而且可以生成具有真实感的三维场景,在虚拟现实、增强现实和计算机游戏等领域有着广泛的应用前景。然而,其基础模型存在训练效率低、泛化能力差、可解释性不足、易受光照和材质变化影响以及无法处理动态场景等问题,在某些情况下无法获得最佳的渲染结果。大量基于此研究的工作陆续展开,且在效率和精度等方面都取得了出色的成果。为了跟踪该领域最新研究成果,对近年来神经辐射场领域的关键算法进行回顾和综述。首先介绍了神经辐射场的产生背景及原理,对后续关键改进模型进行分类探讨。主要涵盖以下几个方面:对神经辐射场基本模型参数的优化,在渲染速度与推理能力方面的提升,对空间表达和光照能力的改善,针对静态场景相机位姿估计与稀疏视图合成方法的改进,以及在动态场景建模领域的发展。对各种模型的速度与性能进行分类对比与分析,并简要介绍了该领域主要模型评估指标与公开数据集。最后对神经辐射场未来发展趋势进行展望。
马汉声, 祝玉华, 李智慧, 阎磊, 司艺艺, 连一萌, 张钰涵. 神经辐射场多视图合成技术综述[J]. 计算机工程与应用, 2024, 60(4): 21-38.
MA Hansheng, ZHU Yuhua, LI Zhihui, YAN Lei, SI Yiyi, LIAN Yimeng, ZHANG Yuhan. Survey of Neural Radiance Fields for Multi-View Synthesis Technologies[J]. Computer Engineering and Applications, 2024, 60(4): 21-38.
[1] PENG L W, SHAMSUDDIN S M. 3D object reconstruction and representation using neural networks[C]//Proceedings of the 2nd International Conference on Computer Graphics and Interactive Techniques in Australasia and South East Asia, 2004: 139-147. [2] REN P, WANG J, GONG M, et al. Global illumination with radiance regression functions[J]. ACM Transactions on Graphics, 2013, 32(4): 130. [3] BLANZ V, BASSO C, POGGIO T, et al. Reanimating faces in images and video[J]. Computer Graphics Forum, 2003, 22(3): 641-650. [4] BROWN M, LOWE D G. Recognising panoramas[C]//Proceedings of the 9th IEEE International Conference on Computer Vision, 2003: 1218-1227. [5] BARRON J T, MALIK J. Shape, illumination, and reflectance from shading[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(8): 1670-1687. [6] BROCK A, LIM T, RITCHIE J M, et al. Neural photo editing with introspective adversarial networks[C]//Proceedings of the 2017 International Conference on Learning Representations, 2017. [7] TEWARI A, FRIED O, THIES J, et al. State of the art on neural rendering[J]. Computer Graphics, 2020, 39(2): 701-727. [8] DELLAERT F, YEN CHEN L. Neural volume rendering: NeRF and beyond[J]. arXiv: 2101. 05204, 2020. [9] LEVOY M, HANRAHAN P. Light field rendering[C]//Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques, 1996: 31-42. [10] BUEHLER C, BOSSE M, MCMILLAN L, et al. Unstructured lumigraph rendering[C]//Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques, 2001: 425-432. [11] GUO K, LINCOLN P, DAVIDSON P, et al. The relightables: volumetric performance capture of humans with realistic relighting[J]. ACM Transactions on Graphics, 2019, 38(6): 1-19. [12] LOMBARDI S, SIMON T, SARAGIH J, et al. Neural volumes: learning dynamic renderable volumes from images[J]. ACM Transactions on Graphics, 2019, 38(4): 1-14. [13] XIE Y, TAKIKAWA T, SAITO S, et al. Neural fields in visual computing and beyond[J]. Computer Graphics Forum, 2022, 41(2): 641-676. [14] MESCHEDER L, OECHSLE M, NIEMEYER M, et al. Occupancy networks: learning 3D reconstruction in function space[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 4455-4465. [15] CHEN Z, ZHANG H. Learning implicit fields for generative shape modeling[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 5932-5941. [16] PARK J J, FLORENCE P, STRAUB J, et al. DeepSDF: learning continuous signed distance functions for shape representation[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 165-174. [17] SAITO S, HUANG Z, NATSUME R, et al. PIFu: pixel-aligned implicit function for high-resolution clothed human digitization[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2020. [18] SITZMANN V, ZOLLH?FER M, WETZSTEIN G. Scene representation networks: continuous 3D-structure-aware neural scene representations[C]//Advances in Neural Information Processing Systems 32, 2019. [19] NIEMEYER M, MESCHEDER L, OECHSLE M, et al. Differentiable volumetric rendering: learning implicit 3D representations without 3D supervision[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 3501-3512. [20] YARIV L, KASTEN Y, MORAN D, et al. Multiview neural surface reconstruction by disentangling geometry and appearance[C]//Advances in Neural Information Processing Systems 33, 2020: 2492-2502. [21] MILDENHALL B, SRINIVASAN P P, TANCIK M, et al. NeRF: representing scenes as neural radiance fields for view synthesis[J]. Communications of the ACM, 2021, 65(1): 99-106. [22] KAJIYA J T, VON HERZEN B P. Ray tracing volume densities[J]. ACM SIGGRAPH Computer Graphics, 1984, 18(3): 165-174. [23] TEWARI A, THIES J, MILDENHALL B, et al. Advances in neural rendering[J]. Computer Graphics Forum, 2022, 41(2): 703-735. [24] BARRON J T, MILDENHALL B, TANCIK M, et al. Mip-NeRF: a multiscale representation for anti-aliasing neural radiance fields[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021: 5855-5864. [25] VERBIN D, HEDMAN P, MILDENHALL B, et al. Ref-NeRF: structured view-dependent appearance for neural radiance fields[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 5481-5490. [26] ZHANG J, ZHANG Y, FU H, et al. Ray priors through reprojection: improving neural radiance fields for novel view extrapolation[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 18355-18365. [27] DENG K, LIU A, ZHU J Y, et al. Depth-supervised NeRF: fewer views and faster training for free[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 12882-12891. [28] WEI Y, LIU S, RAO Y, et al. NerfingMVS: guided optimization of neural radiance fields for indoor multi-view stereo[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021: 5590-5599. [29] ROESSLE B, BARRON J T, MILDENHALL B, et al. dense depth priors for neural radiance fields from sparse input views[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 12882-12891. [30] XU Q, XU Z, PHILIP J, et al. Point-NeRF: point-based neural radiance fields[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 5428-5438. [31] DENG B, BARRON J T, SRINIVASAN P. JaxNeRF: an efficient JAX implementation of NeRF[EB/OL]. (2020)[2022-12-29]. https://github.com/google-research/google-research. [32] HEDMAN P, SRINIVASAN P P, MILDENHALL B, et al. Baking neural radiance fields for real-time view synthesis[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021: 5855-5864. [33] GARBIN S J, KOWALSKI M, JOHNSON M, et al. FastNeRF: high-fidelity neural rendering at 200fps[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021: 14346-14355. [34] YU A, LI R, TANCIK M, et al. PlenOctrees for real-time rendering of neural radiance fields[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021: 5752-5761. [35] LIU L, GU J, ZAW LIN K, et al. Neural sparse voxel fields[C]//Advances in Neural Information Processing Systems 33, 2020: 15651-15663. [36] WU L, LEE J Y, BHATTAD A, et al. DIVeR: real-time and accurate neural radiance fields with deterministic integration for volume rendering[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 16200-16209. [37] MüLLER T, EVANS A, SCHIED C, et al. Instant neural graphics primitives with a multiresolution hash encoding[J]. ACM Transactions on Graphics, 2022, 41(4): 102. [38] BI S, XU Z, SRINIVASAN P, et al. Neural reflectance fields for appearance acquisition[J]. arXiv: 2008. 03824, 2020. [39] SRINIVASAN P P, DENG B, ZHANG X, et al. NeRV: neural reflectance and visibility fields for relighting and view synthesis[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 7495-7504. [40] ZHANG X, SRINIVASAN P P, DENG B, et al. NeRFactor: neural factorization of shape and reflectance under an unknown illumination[J]. ACM Transactions on Graphics, 2021, 40(6): 237. [41] ESPOSITO S, BAIERI D, ZELLMANN S, et al. KiloNeuS: implicit neural representations with real-time global illumination[J]. arXiv: 2206. 10885, 2022. [42] MARTIN-BRUALLA R, RADWAN N, SAJJADI M S M, et al. NeRF in the wild: neural radiance fields for unconstrained photo collections[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 7210-7219. [43] NIEMEYER M, GEIGER A. GIRAFFE: representing scenes as compositional generative neural feature fields[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 11448-11459. [44] ZHANG K, RIEGLER G, SNAVELY N, et al. NeRF++: analyzing and improving neural radiance fields[J]. arXiv: 2010. 07492, 2020. [45] TURKI H, RAMANAN D, SATYANARAYANAN M. Mega-NeRF: scalable construction of large-scale nerfs for virtual fly-throughs[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 12922-12931. [46] XIANGLI Y, XU L, PAN X, et al. BungeeNeRF: progressive neural radiance field for extreme multi-scale scene rendering[C]//Proceedings of the 17th European Conference on Computer Vision, Tel Aviv, Oct 23-27, 2022. Cham: Springer, 2022: 106-122. [47] RUDNEV V, ELGHARIB M, SMITH W, et al. NeRF for outdoor scene relighting[C]//Proceedings of the 17th European Conference on Computer Vision, Tel Aviv, Oct 23-27, 2022. Cham: Springer, 2022: 615-631. [48] TANCIK M, CASSER V, YAN X, et al. Block-NeRF: scalable large scene neural view synthesis[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 8248-8258. [49] PARK K, SINHA U, BARRON J T, et al. Nerfies: deformable neural radiance fields[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021: 5865-5874. [50] PARK K, SINHA U, HEDMAN P, et al. HyperNeRF: a higher-dimensional representation for topologically varying neural radiance fields[J]. ACM Transactions on Graphics, 2021, 40(6): 238. [51] XIAN W, HUANG J B, KOPF J, et al. Space-time neural irradiance fields for free-viewpoint video[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 9421-9431. [52] TRETSCHK E, TEWARI A, GOLYANIK V, et al. Non-rigid neural radiance fields: reconstruction and novel view synthesis of a dynamic scene from monocular video[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021: 12939-12950. [53] LI Z, NIKLAUS S, SNAVELY N, et al. Neural scene flow fields for space-time view synthesis of dynamic scenes[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 6494-6504. [54] PUMAROLA A, CORONA E, PONS-MOLL G, et al. D-NeRF: neural radiance fields for dynamic scenes[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 10318-10327. [55] KUNDU A, GENOVA K, YIN X, et al. Panoptic neural fields: a semantic object-aware neural scene representation[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 12861-12871. [56] ATTAL B, LAIDLAW E, GOKASLAN A, et al. T?rf: time-of-flight radiance fields for dynamic scene view synthesis[C]//Advances in Neural Information Processing Systems 34, 2021: 26289-26301. [57] LUO X, HUANG J B, SZELISKI R, et al. Consistent video depth estimation[J]. ACM Transactions on Graphics, 2020, 39(4): 71. [58] NIEMEYER M, BARRON J T, MILDENHALL B, et al. RegNeRF: regularizing neural radiance fields for view synthesis from sparse inputs[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 5480-5490. [59] CHEN A, XU Z, ZHAO F, et al. MVSNeRF: fast generalizable radiance field reconstruction from multi-view stereo[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021: 14104-14113. [60] YU A, YE V, TANCIK M, et al. pixelNeRF: neural radiance fields from one or few images[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 4576-4585. [61] JAIN A, TANCIK M, ABBEEL P. Putting NeRF on a Diet: semantically consistent few-shot view synthesis[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021: 5865-5874. [62] YEN-CHEN L, FLORENCE P, BARRON J T, et al. iNeRF: inverting neural radiance fields for pose estimation[C]//Proceedings of the 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2021: 1323-1330. [63] CHEN Y, CHEN X, WANG X, et al. Local-to-global registration for bundle-adjusting neural radiance fields[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023: 8264-8273. [64] ROVEDA L, MARONI M, MAZZUCHELLI L, et al. Robot end-effector mounted camera pose optimization in object detection-based tasks[J]. Journal of Intelligent & Robotic Systems, 2021, 104(1): 16. [65] SHAVIT Y, FERENS R. Introduction to camera pose estimation with deep learning[J]. arXiv: 1907. 05272, 2019. [66] SUCAR E, LIU S, ORTIZ J, et al. iMAP: implicit mapping and positioning in real-time[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021: 6229-6238. [67] ZHU Z, PENG S, LARSSON V, et al. NICE-SLAM: neural implicit scalable encoding for SLAM[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 12786-12796. [68] KORHONEN J, YOU J. Peak signal-to-noise ratio revisited: is simple beautiful?[C]//Proceedings of the 2012 4th International Workshop on Quality of Multimedia Experience, 2012: 37-38. [69] WANG Z, BOVIK A C, SHEIKH H R, et al. Image quality assessment: from error visibility to structural similarity[J]. IEEE Transactions on Image Processing, 2004, 13(4): 600-612. [70] ZHANG R, ISOLA P, EFROS A A, et al. The unreasonable effectiveness of deep features as a perceptual metric[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018: 586-595. [71] WANG Z, SIMONCELLI E P, BOVIK A C. Multiscale structural similarity for image quality assessment[C]//Proceedings of the 37th Asilomar Conference on Signals, Systems & Computers, 2003, 2: 1398-1402. [72] STURM J, ENGELHARD N, ENDRES F, et al. A benchmark for the evaluation of RGB-D SLAM systems[C]//Proceedings of the 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2012: 573-580. [73] SCHONBERGER J L, FRAHM J M. Structure-from-motion revisited[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016: 4104-4113. [74] SITZMANN V, THIES J, HEIDE F, et al. DeepVoxels: learning persistent 3D feature embeddings[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 2432-2441. [75] JENSEN R, DAHL A, VOGIATZIS G, et al. Large scale multi-view stereopsis evaluation[C]//Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014: 406-413. [76] YAO Y, LUO Z, LI S, et al. BlendedMVS: a large-scale dataset for generalized multi-view stereo networks[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 1790-1799. [77] MILDENHALL B, SRINIVASAN P P, ORTIZ-CAYON R, et al. Local light field fusion: practical view synthesis with prescriptive sampling guidelines[J]. ACM Transactions on Graphics, 2019, 38(4): 29. [78] DAI A, CHANG A X, SAVVA M, et al. ScanNet: richly-annotated 3D reconstructions of indoor scenes[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017: 5828-5839. [79] CHANG A X, FUNKHOUSER T, GUIBAS L, et al. ShapeNet: an information-rich 3D model repository[J]. arXiv: 1512. 03012, 2015. [80] KNAPITSCH A, PARK J, ZHOU Q Y, et al. Tanks and temples: benchmarking large-scale scene reconstruction[J]. ACM Transactions on Graphics, 2017, 36(4): 78. [81] JAIN A, MILDENHALL B, BARRON J T, et al. Zero-shot text-guided object generation with dream fields[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 867-876. [82] SPEZIALETTI R, STELLA F, MARCON M, et al. Learning to orient surfaces by self-supervised spherical cnns[C]//Advances in Neural information processing systems 33, 2020: 5381-5392. [83] SAJNANI R, POULENARD A, JAIN J, et al. ConDor: self-supervised canonicalization of 3D pose for partial shapes[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 16948-16958. |
[1] | 金涛, 金冉, 侯腾达, 袁杰, 顾骁哲. 多模态检索研究综述[J]. 计算机工程与应用, 2024, 60(5): 62-75. |
[2] | 王蓉, 端木春江. 多耦合反馈网络的图像融合和超分辨率方法[J]. 计算机工程与应用, 2024, 60(5): 210-220. |
[3] | 谢若冰, 李茂军, 李宜伟, 胡建文. 改进YOLOX-s的密集垃圾检测方法[J]. 计算机工程与应用, 2024, 60(5): 250-258. |
[4] | 苏晨阳, 武文红, 牛恒茂, 石宝, 郝旭, 王嘉敏, 高勒, 汪维泰. 深度学习的工人多种不安全行为识别方法综述[J]. 计算机工程与应用, 2024, 60(5): 30-46. |
[5] | 哈里旦木·阿布都克里木, 冯珂, 史亚庆, 尼合买提·阿布都克力木, 阿布都克力木·阿布力孜. 深度学习在骨折诊断中的应用综述[J]. 计算机工程与应用, 2024, 60(5): 47-61. |
[6] | 陈钊鸿, 洪智勇, 余文华, 张昕. 采用平衡函数的大规模多标签文本分类[J]. 计算机工程与应用, 2024, 60(4): 163-172. |
[7] | 张剑锐, 魏霞, 张林鍹, 陈燕楠, 卢杰. 改进YOLO v7的绝缘子检测与定位[J]. 计算机工程与应用, 2024, 60(4): 183-191. |
[8] | 曹策, 陈焰, 周兰江. 基于深度学习和文本情感的上市公司财务舞弊识别方法[J]. 计算机工程与应用, 2024, 60(4): 338-346. |
[9] | 刘炳坤, 皮家甜, 徐进. 结合瓶颈注意力的端到端机械臂视觉伺服研究[J]. 计算机工程与应用, 2024, 60(4): 347-354. |
[10] | 陈磊, 习怡萌, 刘立波. 视频文本跨模态检索研究综述[J]. 计算机工程与应用, 2024, 60(4): 1-20. |
[11] | 朱凯, 李理, 张彤, 江晟, 别一鸣. 视觉Transformer在低级视觉领域的研究综述[J]. 计算机工程与应用, 2024, 60(4): 39-56. |
[12] | 吴则举, 宋丽君, 冀杨. 基于改进特征金字塔的轮胎X光图像缺陷检测[J]. 计算机工程与应用, 2024, 60(3): 270-279. |
[13] | 钱丽萍, 吉晓梅. 基于异构指令图的恶意软件分类方法研究[J]. 计算机工程与应用, 2024, 60(3): 299-308. |
[14] | 宋程, 谢振平. 中文纠错任务为例的数据集增强质量评价方法[J]. 计算机工程与应用, 2024, 60(3): 331-339. |
[15] | 田苗苗, 支力佳, 张少敏, 晁代福. 医学CT影像超分辨率深度学习方法综述[J]. 计算机工程与应用, 2024, 60(3): 44-60. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||