计算机工程与应用 ›› 2025, Vol. 61 ›› Issue (16): 370-382.DOI: 10.3778/j.issn.1002-8331.2501-0012

• 工程与应用 • 上一篇    

基于大语言模型的企业碳排放分析与知识问答系统

韩明,曹智轩,王敬涛,段丽英,王剑宏   

  1. 1.石家庄学院 未来信息技术学院,石家庄 050035 
    2.河北省物联网智能感知与应用技术创新中心,石家庄 050035
  • 出版日期:2025-08-15 发布日期:2025-08-15

Enterprise Carbon Emission Analysis and Knowledge Question-Answering System Based on Large Language Models

HAN Ming, CAO Zhixuan, WANG Jingtao, DUAN Liying, WANG Jianhong   

  1. 1.College of Future Information Technology, Shijiazhuang University, Shijiazhuang 050035, China
    2.Hebei Province Internet of Things Intelligent Sensing and Application Technology Innovation Center, Shijiazhuang 050035, China
  • Online:2025-08-15 Published:2025-08-15

摘要: 随着全球气候变化日益严重,企业碳排放分析成为国际关注的焦点,针对通用大语言模型(large language model,LLM)知识更新滞后,增强生成架构在处理复杂问题时缺乏专业性与准确性,以及大模型生成结果中幻觉率高的问题,通过构建专有知识库,开发了基于大语言模型的企业碳排放分析与知识问答系统。提出了一种多样化索引模块构建方法,构建高质量的知识与法规检索数据集。针对碳排放报告(政策)领域的知识问答任务,提出了自提示检索增强生成架构,集成意图识别、改进的结构化思维链、混合检索技术、高质量提示工程和Text2SQL系统,支持多维度分析企业可持续性报告,为企业碳排放报告(政策)提供了一种高效、精准的知识问答解决方案。通过多层分块机制、文档索引和幻觉识别功能,确保结果的准确性与可验证性,降低了LLM技术在系统中的幻觉率。通过对比实验,所提算法在各模块的协同下在检索增强生成实验中各指标表现优异,对于企业碳排放报告的关键信息抽取和报告评价,尤其是长文本处理具有明显的优势。

关键词: 大语言模型(LLM), 知识问答系统, 大模型幻觉, 信息检索, 提示学习

Abstract: As global climate change becomes increasingly severe, enterprise carbon emission analysis has become a focal point of international attention. Addressing the issues of outdated knowledge updates in general-purpose large language models (LLM), the lack of professionalism and accuracy in augmented generation architectures when handling complex problems, and the high hallucination rate in the outputs of large-scale models, this paper proposes an enterprise carbon emission analysis and knowledge question-answering (Q&A) system based on large language models, by constructing a proprietary knowledge base. The paper introduces a diversified indexing module construction method to build a high-quality knowledge and regulatory retrieval dataset. For the knowledge Q&A task in the domain of carbon emission reports (policies), the paper proposes a self-prompting retrieval-augmented generation architecture, integrating intent recognition, improved structured chain-of-thought, hybrid retrieval techniques, high-quality prompt engineering, and Text2SQL system. This supports multi-dimensional analysis of enterprise sustainability reports, providing an efficient and accurate knowledge Q&A solution for carbon emission reports (policies). Through the multi-layer chunking mechanism, document indexing, and hallucination detection functionality, the paper ensures the accuracy and verifiability of the results, reducing the hallucination rate of LLM technology in this system. Through comparative experiments, the algorithm proposed in this paper demonstrates superior performance across various metrics in retrieval-augmented generation experiments under the synergy of all modules. It exhibits significant advantages in key information extraction and report evaluation for enterprise carbon emission reports, particularly in handling long-text.

Key words: large language model (LLM), knowledge question-answering system, hallucination, information retrieval, prompt learning