计算机工程与应用 ›› 2025, Vol. 61 ›› Issue (5): 113-121.DOI: 10.3778/j.issn.1002-8331.2406-0230

• 模式识别与人工智能 • 上一篇    下一篇

基于大模型检索增强生成的气象数据库问答模型实现

江双五,张嘉玮,华连生,杨菁林   

  1. 1.安徽省气象信息中心,合肥 230031
    2.北京航空航天大学,北京 100191
    3.国家计算机网络应急技术处理协调中心,北京 100029
  • 出版日期:2025-03-01 发布日期:2025-03-01

Implementation of Meteorological Database Question-Answering Based on Large-Scale Model Retrieval-Augmentation Generation

JIANG Shuangwu, ZHANG Jiawei, HUA Liansheng, YANG Jinglin   

  1. 1.Anhui Meteorological Information Center, Hefei 230031, China
    2.Beihang University, Beijing 100191, China
    3.National Computer Network Emergency Response Technical Team/Coordination Center of China, Beijing 100029, China
  • Online:2025-03-01 Published:2025-03-01

摘要: 随着信息检索和知识获取需求的增加,智能问答系统在多个垂直领域得到广泛应用。然而,在气象领域仍缺乏专门的智能问答系统研究,严重限制了气象信息的高效利用和气象系统的服务效率。针对这一需求,提出了一种面向气象数据库的大模型检索智能问答技术实现方案。该方案设计了一种基于关系型数据库(SQL)与文档型数据(NoSQL)的多通道查询路由(multi-channel retrieval router,McRR)方法,为了适配数据库进行大模型查询以及增强大模型对查询表的理解,分别提出指令查询转换方法与数据库表摘要方法DNSUM,提升大模型对数据库的语义理解能力,通过结合问题理解、重排序器和响应生成等关键模块,构建了一个端到端的智能问答模型,可实现多数据源的相关知识检索及答案生成。实验结果显示,该模型可以有效理解用户问题并生成准确的答案,具有良好的检索和响应能力。不仅为气象领域提供了一种智能问答的解决方案,也为气象智能问答技术提供了新的应用实施参考。

关键词: 数据库查询, 数据库问答, 大语言模型, 检索增强生成, 气象问答

Abstract: With the increasing demand for information retrieval and knowledge acquisition, question-answering systems are widely applied across various domains. However, there is a notable lack of specialized question-answering system research in the meteorological field, which severely limits the efficient utilization of meteorological information and the service efficiency of meteorological systems. To address this gap, it proposes a retrieval-augmented generation based question-answering implementation scheme for meteorological databases. This scheme designs a multi-channel query routing (McRR) method based on relational databases (SQL) and document-oriented data (NoSQL). Additionally, to adapt large model queries to databases and enhance the model’s understanding of query tables, the paper proposes an instruction query conversion method and a database table summarization method (termed as DNSUM) to improve the model’s semantic understanding of databases. Furthermore, by integrating key modules such as question understanding, re-rankers, and response generation, it constructs an end-to-end intelligent question-answering engine capable of retrieving relevant knowledge and generating answers from multiple data sources. Experimental results on the constructed meteorological question-answering dataset demonstrate that this engine effectively understands user queries and generates accurate answers, exhibiting strong retrieval and response capabilities. This research not only provides a question-answering solution for the meteorological field but also offers new directions for the application of question-answering technology in vertical domains.

Key words: sructured query, database question-answering, large language models(LLM), retrieval-augmented generation (RAG), meteorological Q&, A