Computer Engineering and Applications ›› 2025, Vol. 61 ›› Issue (24): 29-39.DOI: 10.3778/j.issn.1002-8331.2503-0101

• Research Hotspots and Reviews • Previous Articles     Next Articles

Progress and Challenges in 3D Large Language Model Research

GUO Ming1,2,3, ZHANG Yaru1, ZHU Li1, WANG Guoli1,2,3+, HUANG Ming1,2,3   

  1. 1.School of Geomatics and Urban Spatial Informatics, Beijing University of Civil Engineering and Architecture, Beijing 102616, China
    2.Engineering Research Center of Representative Building and Architectural Heritage Database, Ministry of Education, Beijing 100044, China
    3.International Joint Laboratory of Safety and Energy Conservation for Ancient Buildings, Ministry of Education, Beijing 100044, China
  • Online:2025-12-15 Published:2025-12-15

三维大语言模型研究进展与挑战

郭明1,2,3,张雅如1,朱丽1,王国利1,2,3+,黄明1,2,3   

  1. 1.北京建筑大学 测绘与城市空间信息学院,北京 102616
    2.代表性建筑与古建筑数据库教育部工程研究中心,北京 100044
    3.教育部古建筑安全与节能国际合作联合实验室,北京 100044

Abstract: Three-dimensional large language model(3D LLM), as an important cross-modal learning approach, can not only process linguistic data but also integrate and comprehend diverse modalities such as 3D point clouds, images, and videos, promoting the development in scene understanding, reasoning, and generative tasks. With the increasing demand for spatial perception and multimodal data processing in intelligent systems, the application of 3D LLM is becoming more significant. For the development of 3D LLM, its characteristics, research directions, challenges and research objectives are explored. The network framework of 3D LLM is discussed, including the construction of multi-source 3D datasets, data preprocessing and feature extraction, multimodal feature fusion, model pre-training and efficient optimization strategies, as well as the application of a variety of downstream tasks, and the evaluation methods of 3D LLM are analyzed, covering the model comprehensive performance comparison, zero-sample learning and generalization ability analysis. Finally, the research limitations of 3D LLM are briefly described, application prospects are envisioned, and directions for future research can be proposed.

Key words: 3D large language model(3D LLM), multimodal learning, point cloud, neural network, feature fusion

摘要: 三维大语言模型(3D LLM)作为一种重要的跨模态学习方法,不仅能够处理语言数据,还能融合和理解三维点云、图像和视频等多种数据,推动场景理解、推理和生成任务的发展。随着智能系统对空间感知和多模态数据处理需求的增加,三维大语言模型的应用需求日益增长。针对三维大语言模型发展,探讨了其特点、研究方向、面临的挑战及研究目标,讨论了三维大语言模型的网络框架,包括多源三维数据集构建、数据预处理与特征提取、多模态特征融合、模型预训练与高效优化策略,以及多种下游任务的应用,分析了三维大语言模型的评估方法,涵盖模型综合性能比较、零样本学习及泛化能力分析。简述了三维大语言模型的研究局限,展望了应用前景,并提出了未来可进行研究的方向。

关键词: 三维大语言模型(3D LLM), 多模态学习, 点云, 神经网络, 特征融合