基于大型语言模型的检索增强生成综述

doi:10.3778/j.issn.1002-8331.2410-0088

摘要/Abstract

摘要： 最近，智能体代理能在复杂任务中提供高效的解决方案，在工业界备受关注。作为智能体代理的常见范式之一，检索增强生成（retrieval-augmented generation，RAG）旨在结合信息检索和内容生成技术增强生成响应质量，已逐步成为研究的重点。在对国内外检索增强生成方法研究的基础上，阐述了RAG的基本概念及工作流程，归纳了技术现状，分析了现有RAG技术的优缺点，梳理了现有评估指标、数据集和基准。最后探讨了RAG技术在未来应用场景下所面临的挑战，并展望了其未来发展方向。

关键词: 大语言模型, 检索增强生成, 评估基准

Abstract: Artificial intelligence agents provide efficient solutions in complex tasks, which have recently gained attention in industry. As one of the paradigms of artificial intelligence agents, retrieval-augmented generation (RAG), which aims to enhance the quality of generated responses by combining information retrieval and content generation techniques, has gradually become the focus of research. According to the studies on retrieval enhancement generation methods at home and abroad, the basic concept and workflow of RAG are elaborated, the current state of the technology is summarized, the advantages and disadvantages of the existing RAG technology are analyzed, and the existing evaluation indexes, datasets and benchmarks are sorted out. Finally, challenges faced by RAG technology in future application scenarios are discussed and the future development direction of RAG technology is envisioned.

Key words: large language models, retrieval-augmented generation, evaluation benchmarks

刘雪颖, 云静, 李博, 史晓国, 张钰莹. 基于大型语言模型的检索增强生成综述[J]. 计算机工程与应用, 2025, 61(13): 1-25.

LIU Xueying, YUN Jing, LI Bo, SHI Xiaoguo, ZHANG Yuying. Survey of Retrieval-Augmented Generation Based on Large Language Models[J]. Computer Engineering and Applications, 2025, 61(13): 1-25.

参考文献

[1] QIAN J L, JIN Z Y, ZHANG Q, et al. A liver cancer question-answering system based on next-generation intelligence and the large model med-PaLM 2[J]. International Journal of Computer Science and Information Technology, 2024, 2(1): 28-35.
[2] YUE S B, CHEN W, WANG S Y, et al. DISC-LawLLM: fine-tuning large language models for intelligent legal services[J]. arXiv:2309.11325, 2023.
[3] 房晓楠. 松鼠AI的“AI+智适应教育” 之路该如何走?[J]. 机器人产业, 2019(1): 80-84.
FANG X N. How should squirrel AI take the road of “AI+intellectual adaptation education”?[J]. Robot Industry, 2019(1): 80-84.
[4] NIKDAN M, TABESH S, CMCEVIC E, et al. RoSA: accurate parameter-efficient fine-tuning via robust adaptation[C]//Proceedings of the 41st International Conference on Machine Learning, 2024.
[5] LI J T, LIU Y Q, FAN W Q, et al. Empowering molecule discovery for molecule-caption translation with large language models: a ChatGPT perspective[J]. IEEE Transactions on Knowledge and Data Engineering, 2024, 36(11): 6071-6083.
[6] SHUSTER K, POFF S, CHEN M Y, et al. Retrieval augmentation reduces hallucination in conversation[C]//Findings of the Association for Computational Linguistics: EMNLP 2021. Stroudsburg: ACL, 2021: 3784-3803.
[7] ZHAO T, WALLACE E, FENG S, et al. Calibrate before use: improving few-shot performance of language models[C]//Proceedings of the International Conference on Machine Learning, 2021: 12697-12706.
[8] CHENG X, LUO D, CHEN X, et al. Lift yourself up: retrieval-augmented text generation with self-memory[C]//Advances in Neural Information Processing Systems, 2024.
[9] GAO L, MADAAN A, ZHOU S, et al. PAL: program-aided language models[C]//Proceedings of the International Conference on Machine Learning , 2023: 10764-10799.
[10] HUANG L, YU W J, MA W T, et al. A survey on hallucination in large language models: principles, taxonomy, challenges, and open questions[J]. ACM Transactions on Information Systems, 2025, 43(2): 1-55.
[11] IZACARD G, LEWIS P, LOMELI M, et al. Atlas: few-shot learning with retrieval augmented language models[J]. Journal of Machine Learning Research, 2023, 24(251): 1-43.
[12] WU Y, RABE M N, HUTCHINS D L, et al. Memorizing transformers[C]//Proceedings of the International Conference on Learning Representations, 2022.
[13] GUU K, LEE K, TUNG Z, et al. REALM: retrieval-augmented language model pre-training[C]//Proceedings of the International Conference on Machine Learning, 2020: 3929-3938.
[14] LEWIS P, PERES E, PIKTUS A, et al. Retrieval-augmented generation for knowledge-intensive nlp tasks[C]//Advances in Neural Information Processing Systems, 2020: 9459-9474.
[15] BORGEAUD S, MENSCH A, HOFFMANN J, et al. Improving language models by retrieving from trillions of tokens[C]//Proceedings of the International Conference on Machine Learning, 2022: 2206-2240.
[16] IZACARD G, GRAVE E. Leveraging passage retrieval with generative models for open domain question answering[C]//Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume. Stroudsburg: ACL, 2021: 874-880.
[17] KHANDELWAL U, LEVY O, JURAFSKY D, et al. Generalization through memorization: nearest neighbor language models[C]//Proceedings of the International Conference on Learning Representations, 2019.
[18] HE J X, NEUBIG G, BERG-KIRKPATRICK T. Efficient nearest neighbor language models[C]//Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2021: 5703-5714.
[19] HE Z Y, ZHONG Z X, CAI T L, et al. REST: retrieval-based speculative decoding[C]//Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: ACL, 2024: 1582-1595.
[20] BANG F. GPTCache: an open-source semantic cache for LLM applications enabling faster answers and cost savings[C]//Proceedings of the 3rd Workshop for Natural Language Processing Open Source Software, 2023: 212-218.
[21] ZHAO P H, ZHANG H L, YU Q H, et al. Retrieval-augmented generation for AI-generated content: a survey[J]. arXiv:2402. 19473, 2009.
[22] GAO Y F, XIONG Y, GAO X Y, et al. Retrieval-augmented generation for large language models: a survey[J]. arXiv:2312. 10997, 2023.
[23] FAN W Q, DING Y J, NING L B, et al. A survey on RAG meeting LLMs: towards retrieval-augmented large language models[C]//Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. New York: ACM, 2024: 6491-6501.
[24] CHEN J W, LIN H Y, HAN X P, et al. Benchmarking large language models in retrieval-augmented generation[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2024, 38(16): 17754-17762.
[25] ILIN I. Advanced RAG techniques: an illustrated overview[EB/OL]. [2024-09-25]. https://github.com/NirDiamant/RAG_TECHNIQUES.
[26] ZHENG H S, MISHRA S, CHEN X, et al. Take a step back: evoking reasoning via abstraction in large language models[C]//Proceedings of the 12th International Conference on Learning Representations, 2024.
[27] WANG S H, XU Y C, FANG Y W, et al. Training data is more valuable than you think: a simple and effective method by retrieving from training data[C]//Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2022: 3170-3179.
[28] MA X B, GONG Y Y, HE P C, et al. Query rewriting in retrieval-augmented large language models[C]//Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2023: 5303-5315.
[29] KHATTAB O, SANTHANAM K, LI X L, et al. Demonstrate-search-predict: composing retrieval and language models for knowledge-intensive NLP[J]. arXiv:2212.14024, 2022.
[30] WANG Y L, LI P, SUN M S, et al. Self-knowledge guided retrieval augmentation for large language models[C]//Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023.
[31] JEONG S, BAEK J, CHO S, et al. Adaptive-RAG: learning to adapt retrieval-augmented large language models through question complexity[C]//Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: ACL, 2024: 7036-7050.
[32] CHEN T, WANG H W, CHEN S H, et al. Dense X retrieval: what retrieval granularity should we use?[C]//Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2024: 15159-15177.
[33] ZHA L Y, ZHOU J L, LI L Y, et al. TableGPT: towards unifying tables, nature language and commands into one GPT[J]. arXiv:2307.08674, 2023.
[34] GAUR M, GUNARATNA K, SRINIVASAN V, et al. ISEEQ: information seeking question generation using dynamic meta-information retrieval and knowledge graphs[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2022, 36(10): 10672-10680.
[35] YANG L Y, CHEN H Y, LI Z, et al. Give us the facts: enhancing large language models with knowledge graphs for fact-aware language modeling[J]. IEEE Transactions on Knowledge and Data Engineering, 2024, 36(7): 3091-3110.
[36] HE X, TIAN Y, SUN Y, et al. G-Retriever: retrieval-augmented generation for textual graph understanding and question answering[J]. arXiv:2402.07630, 2024.
[37] TEJA R. Evaluating the ideal chunk size for a RAG system using LlamaIndex[EB/OL]. [2024-10-01]. https://www.llamaindex.ai/blog/evaluating-the-ideal-chunk-size-for-a-ragsystem-using-llamaindex-6207e5d3fec5.
[38] YANG S. Advanced RAG 01: small-tobig retrieval[EB/OL]. (2023-11-05)[2024-10-01]. https://towardsdatascience.com/advanced-rag-01-small-to-big-retrieval-172181b396d4.
[39] QIAN H J, LIU Z, MAO K L, et al. Grounding language model with chunking-free in-context retrieval[J]. arXiv:2402. 09760, 2024.
[40] ZHAO J H, JI Z Y, FENG Y C, et al. Meta-chunking: learning efficient text segmentation via logical perception[J]. arXiv:2410.12788, 2024.
[41] LIANG Y, JIANG Z X, YIN D, et al. RAAT: relation-augmented attention transformer for relation modeling in document-level event extraction[C]//Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: ACL, 2022: 4985-4997.
[42] SUN Z, WANG X, TAY Y, et al. Recitation-augmented language models[J]. arXiv:2210.01296, 2022.
[43] WANG K X, REIMERS N, GUREVYCH I. DAPR: a benchmark on document-aware passage retrieval[J]. arxiv:2305. 13915, 2023.
[44] KIM J, NAM J, MO S, et al. SuRe: summarizing retrievals using answer candidates for open-domain QA of LLMs[J]. arXiv:2404.13081, 2024.
[45] DOOSTMOHAMMADI E, NORLUND T, KUHLMANN M, et al. Surface-based retrieval reduces perplexity of retrieval-augmented language models[C]//Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2023: 521-529.
[46] XIAO S T, LIU Z, ZHANG P T, et al. C-pack: packed resources for general Chinese embeddings[C]//Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2024: 641-649.
[47] LIU Z, XIAO S T, SHAO Y X, et al. RetroMAE-2: duplex masked auto-encoder for pre-training retrieval-oriented language models[C]//Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2023: 2635-2648.
[48] SHI W J, MIN S, YASUNAGA M, et al. REPLUG: retrieval-augmented black-box language models[C]//Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: ACL, 2024: 8371-8384.
[49] ZHANG L X, YU Y, WANG K, et al. ARL2: aligning retrievers for black-box large language models via self-guided adaptive relevance labeling[J]. arXiv:2402.13542, 2024.
[50] DAI Z, ZHAO V Y, MA J, et al. Promptagator: few-shot dense retrieval from 8 examples[J]. arXiv:2209.11755, 2022.
[51] LUO K, LIU Z, XIAO S T, et al. BGE landmark embedding: a chunking-free embedding method for retrieval augmented long-context large language models[J]. arXiv:2402. 11573, 2024.
[52] LI X M, LI J. AnglE-optimized text embeddings[J]. arXiv: 2309.12871, 2023.
[53] YOON S, CHOI E, KIM J, et al. ListT5: listwise reranking with fusion-in-decoder improves zero-shot retrieval[J]. arXiv:2402.15838, 2024.
[54] GAO L Y, MA X G, LIN J, et al. Precise zero-shot dense retrieval without relevance labels[C]//Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2023: 1762-1777.
[55] SARTHI P, ABDULLAH S, TULI A, et al. RAPTOR: recursive abstractive processing for tree-organized retrieval[J]. arXiv:2401.18059, 2024.
[56] WANG Y, LIPKA N, ROSSI R A, et al. Knowledge graph prompting for multi-document question answering[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2024, 38(17): 19206-19214.
[57] RACKAUCKAS Z. RAG-Fusion: a new take on retrieval augmented generation[J]. International Journal on Natural Language Computing, 2024, 13(1): 37-47.
[58] ZHOU D, SVHARLI N, HOU L, et al. Least-to-most prompting enables complex reasoning in large language models[J]. arXiv: 2205.10625, 2022.
[59] PENG W J, LI G Y, JIANG Y, et al. Large language model based long-tail query rewriting in Taobao search[C]//Companion Proceedings of the ACM Web Conference 2024. New York: ACM, 2024: 20-28.
[60] LI X, NIE E, LIANG S. From classification to generation: insights into crosslingual retrieval augmented ICL[J]. arXiv:2311.06595, 2023.
[61] KARPUKHIN V, OGUZ B, MIN S, et al. Dense passage retrieval for open-domain question answering[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2020: 6769-6781.
[62] CHENG D X, HUANG S H, BI J Y, et al. UPRISE: universal prompt retrieval for improving zero-shot evaluation[C]//Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2023: 12318-12337.
[63] YOON J, CHEN Y F, ARIK S, et al. Search-adaptor: embedding customization for information retrieval[C]//Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2024: 12230-12247.
[64] YANG H Y, LI Z T, ZHANG Y, et al. PRCA: fitting black-box large language models for retrieval question answering via pluggable reward-driven contextual adapter[C]//Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2023: 5364-5375.
[65] YAN S Q, GU J C, ZHU Y, et al. Corrective retrieval augmented generation[J]. arXiv:2401.15884, 2024.
[66] YU W, ITER D, WANG S, et al. Generate rather than retrieve: large language models are strong context generators[J]. arXiv:2209.10063, 2022.
[67] LUO Z Y, XU C, ZHAO P, et al. Augmented large language models with parametric knowledge guiding[J]. arXiv:2305. 04757, 2023.
[68] MA Y B, CAO Y X, HONG Y, et al. Large language model is not a good few-shot information extractor, but a good reranker for hard samples![C]//Findings of the Association for Computational Linguistics: EMNLP 2023. Stroudsburg: ACL, 2023: 10572-10601.
[69] DONG J L, FATEMI B, PEROZZI B, et al. Don’t forget to connect! improving RAG with graph-based reranking[J]. arXiv:2405.18414, 2024.
[70] YU Y, PING W, LIU Z, et al. RankRAG: unifying context ranking with retrieval-augmented generation in LLMs[J]. arXiv:2407.02485, 2024.
[71] ANDERSON N, WILSON C, RICHARDSON S D. Lingua: addressing scenarios for live interpretation and automatic dubbing[C]//Proceedings of the 15th Biennial Conference of the Association for Machine Translation in the Americas (Volume 2: Users and Providers Track and Government Track), 2022: 202-209.
[72] JIANG H Q, WU Q H, LUO X F, et al. LongLLMLingua: accelerating and enhancing LLMs in long context scenarios via prompt compression[J]. arXiv:2310.06839, 2023.
[73] WANG Z R, ARAKI J, JIANG Z B, et al. Learning to filter context for retrieval-augmented generation[J]. arXiv:2311. 08377, 2023.
[74] XU F Y, SHI W J, CHOI E. RECOMP: improving retrieval-augmented LMs with compression and selective augmentation[J]. arXiv:2310.04408, 2023.
[75] KIM Y, KIM H J, PARK C, et al. Adaptive contrastive decoding in retrieval-augmented generation for handling noisy contexts[C]//Findings of the Association for Computational Linguistics: EMNLP 2024. Stroudsburg: ACL, 2024: 2421-2431.
[76] ZHU K, FENG X C, DU X Y, et al. An information bottleneck perspective for effective noise filtering on retrieval-augmented generation[J]. arXiv:2406.01549, 2024.
[77] CUI J, LI Z, YAN Y, et al. Chatlaw: open-source legal large language model with integrated external knowledge bases[J]. arXiv:2306.16092, 2023.
[78] LI W Y, LI J A, RAMOS R, et al. Understanding retrieval robustness for retrieval-augmented image captioning[J]. arXiv: 2406.02265, 2024.
[79] LI X Z, LIU Z H, XIONG C Y, et al. Structure-aware language model pretraining improves dense retrieval on structured data[C]//Findings of the Association for Computational Linguistics: ACL 2023. Stroudsburg: ACL, 2023: 11560-11574.
[80] SHI T Y, LI L Z, LIN Z J, et al. Dual-feedback knowledge retrieval for task-oriented dialogue systems[C]//Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2023: 6566-6580.
[81] LIN X V, CHEN X, CHEN M, et al. RA-DIT: retrieval-augmented dual instruction tuning[J]. arXiv:2310.01352, 2023.
[82] ROSSET C, CHUNG H L, QIN G H, et al. Researchy questions: a dataset of multi-perspective, decompositional questions for LLM web agents[J]. arXiv:2402.17896, 2024.
[83] FENG J Z, TAO C Y, GENG X B, et al. Synergistic interplay between search and large language models for information retrieval[J]. arXiv:2305.07402, 2023.
[84] SHAO Z H, GONG Y Y, SHEN Y L, et al. Enhancing retrieval-augmented large language models with iterative retrieval-generation synergy[C]//Findings of the Association for Computational Linguistics: EMNLP 2023. Stroudsburg: ACL, 2023: 9248-9274.
[85] LI M F, MIAO S Q, LI P. Simple is effective: the roles of graphs and large language models in knowledge-graph-based retrieval-augmented generation[J]. arXiv:2410.20724, 2024.
[86] TAN J J, DOU Z C, ZHU Y T, et al. Small models, big insights: leveraging slim proxy models to decide when and what to retrieve for LLMs[J]. arXiv:2402.12052, 2024.
[87] YUE Z R, ZENG H M, SHANG L Y, et al. Retrieval augmented fact verification by synthesizing contrastive arguments[J]. arXiv:2406.09815, 2024.
[88] WANG Z, LIU A, LIN H, et al. RAT: retrieval augmented thoughts elicit context-aware reasoning in long-horizon generation[J]. arXiv:2403.05313, 2024.
[89] TRIVEDI H, BALASUBRAMANIAN N, KHOT T, et al. Interleaving retrieval with chain-of-thought reasoning for knowledge-intensive multi-step questions[C]//Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2023: 10014-10037.
[90] KIM G, KIM S, JEON B, et al. Tree of clarifications: answering ambiguous questions with retrieval-augmented large language models[C]//Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2023: 996-1009.
[91] LI X X, ZHAO R C, CHIA Y K, et al. Chain-of-knowledge: grounding large language models via dynamic knowledge adapting over heterogeneous sources[J]. arXiv:2305.13269, 2023.
[92] ZHANG J W. Graph-ToolFormer: to empower LLMs with graph reasoning ability via prompt augmented by ChatGPT[J]. arXiv:2304.11116, 2023.
[93] NAKANO R, HILTON J, BALAJI S, et al. WebGPT: browser-assisted question-answering with human feedback[J]. arXiv:2112.09332, 2021.
[94] JIANG Z B, XU F, GAO L Y, et al. Active retrieval augmented generation[C]//Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2023: 7969-7992.
[95] ASAI A, WU Z, WANG Y, et al. Self-RAG: learning to retrieve, generate, and critique through self-reflection[J]. arXiv:2310. 11511, 2023.
[96] LU H Z, LIU Z X. Improving retrieval-augmented code comment generation by retrieving for generation[J]. arXiv: 2408.03623, 2024.
[97] XIA Y, ZHOU J B, SHI Z H, et al. Improving retrieval augmented language model with self-reasoning[J]. arXiv:2407. 19813, 2024.
[98] YANG D J, RAO J M, CHEN K Z, et al. IM-RAG: multi-round retrieval-augmented generation through learning inner monologues[C]//Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2024: 730-740.
[99] WANG C R, LONG Q Q, XIAO M, et al. BioRAG: a RAG-LLM framework for biological question reasoning[J]. arXiv:2408.01107, 2024.
[100] LIN X Y, WANG W J, LI Y Q, et al. Data-efficient fine-tuning for LLM-based recommendation[C]//Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2024: 365-374.
[101] OVADIA O, BRIEF M, MISHAELI M, et al. Fine-tuning or retrieval? comparing knowledge injection in LLMs[J]. arXiv:2312.05934, 2023.
[102] SOUDANI H, KANOULAS E, HASIBI F. Fine tuning vs. retrieval augmented generation for less popular knowledge[C]//Proceedings of the 2024 Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region. New York: ACM, 2024: 12-22.
[103] LEE J, CHEN A, DAI Z Y, et al. Can long-context language models subsume retrieval, RAG, SQL, and more? [J]. arXiv:2406.13121, 2024.
[104] JIANG X K, FANG Y, QIU R H, et al. TC-RAG: turing-complete RAG’s case study on medical LLM systems[J]. arXiv:2408.09199, 2024.
[105] BARNETT S, KURNIAWAN S, THUDUMU S, et al. Seven failure points when engineering a retrieval augmented generation system[C]//Proceedings of the IEEE/ACM 3rd International Conference on AI Engineering - Software Engineering for AI. New York: ACM, 2024: 194-199.
[106] ZHAO X, LU J, DENG C, et al. Beyond one-model-fits-all: a survey of domain specialization for large language models[J]. arXiv:2305.18703, 2023.
[107] BLAGOJEVI V. Enhancing RAG pipelines in haystack: introducing DiversityRanker and LostInTheMiddleRanker[EB/OL]. (2023-08-09)[2024-10-07]. https://towardsdatascience.com/enhancing-rag-pipelines-in-haystack-45f14e2bc9f5.
[108] SINGAL R, PATWA P, PATWA P, et al. Evidence-backed fact checking using RAG and few-shot in-context learning with LLMs[C]//Proceedings of the 7th Fact Extraction and Verification Workshop. Stroudsburg: ACL, 2024: 91-98.
[109] LEE J S, HSIANG J. Patent claim generation by fine-tuning OpenAI GPT-2[J]. World Patent Information, 2020, 62: 101983.
[110] PARK J S, O’BRIEN J, CAI C J, et al. Generative agents: interactive simulacra of human behavior[C]//Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology. New York: ACM, 2023: 1-22.
[111] WU J D, ZHU J Y, QI Y L, et al. Medical graph RAG: towards safe medical large language model via graph retrieval-augmented generation[J]. arXiv:2408.04187, 2024.
[112] DONG Y, MU R H, ZHANG Y H, et al. Safeguarding large language models: a survey[J]. arXiv:2406.02622, 2024.
[113] ROFFO G. Exploring advanced large language models with LLMsuite[J]. arXiv:2407.12036, 2024.
[114] LENG Q, UHLENHUTH K, POLYZOTIS A. Best practices for LLM evaluation of RAG applications[EB/OL]. (2023-09-12)[2024-10-07]. https://www.databricks.com/blog/LLM-auto-eval-best-practices-RAG.
[115] ES S, JAMES J, ANKE L E, et al. RAGAs: automated evaluation of retrieval augmented generation[C]//Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations, 2024: 150-158.
[116] LIU Y, HUANG L Z, LI S C, et al. RECALL: a benchmark for LLMs robustness against external counterfactual knowledge[J]. arXiv:2311.08147, 2023.
[117] SAAD-FALCON J, KHATTAB O, POTTS C, et al. ARES: an automated evaluation framework for retrieval-augmented generation systems[C]//Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: ACL, 2024: 338-354.
[118] TANG Y X, YANG Y. MultiHop-RAG: benchmarking retrieval-augmented generation for multi-hop queries[J]. arXiv:2401.15391, 2024.
[119] LYU Y J, LI Z Y, NIU S M, et al. CRUD-RAG: a comprehensive Chinese benchmark for retrieval-augmented generation of large language models[J]. ACM Transactions on Information Systems, 2025, 43(2): 1-32.
[120] XIONG G Z, JIN Q, LU Z Y, et al. Benchmarking retrieval-augmented generation for medicine[J]. arXiv:2402.13178, 2024.
[121] WANG S, KHRAMTSOVA E, ZHUANG S Y, et al. FeB4RAG: evaluating federated search in the context of retrieval augmented generation[C]//Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2024: 763-773.
[122] XU Z K, LI Y H, DING R X, et al. Let LLMs take on the latest challenges! A Chinese dynamic question answering benchmark[J]. arXiv:2402.19248, 2024.
[123] WANG S T, LIU J N, SONG S R, et al. DomainRAG: a Chinese benchmark for evaluating domain-specific retrieval-augmented generation[J]. arXiv:2406.05654, 2024.
[124] YU X D, CHENG H, LIU X D, et al. ReEval: automatic hallucination evaluation for retrieval-augmented large language models via transferable adversarial attacks[C]//Findings of the Association for Computational Linguistics: NAACL 2024. Stroudsburg: ACL, 2024: 1333-1351.
[125] HOFST?TTER S, CHEN J C, RAMAN K, et al. FiD-Light: efficient and effective retrieval-augmented text generation[C]//Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2023: 1437-1447.
[126] CUCONASU F, TRAPPOLINI G, SICILIANO F, et al. The power of noise: redefining retrieval for RAG systems[C]//Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2024: 719-729.
[127] SALEMI A, ZAMANI H. Evaluating retrieval quality in retrieval-augmented generation[C]//Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2024: 2395-2400.
[128] ZHU K L, LUO Y F, XU D L, et al. RAGEval: scenario specific RAG evaluation dataset generation framework[J]. arXiv:2408.01262, 2024.
[129] RU D, QIU L, HU X, et al. RAGChecker: a fine-grained framework for diagnosing retrieval-augmented generation[J]. arXiv:2408.08067, 2024.
[130] TU S Q, WANG Y C, YU J F, et al. R-Eval: a unified toolkit for evaluating domain knowledge of retrieval augmented large language models[C]//Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. New York: ACM, 2024: 5813-5824.
[131] WANG A, PRUKSACHATKUN Y, NANGIA N, et al. SuperGLUE: a stickier benchmark for general-purpose language understanding systems[C]//Advances in Neural Information Processing Systems, 2019.
[132] PETRONI F, PIKTUS A, FAN A, et al. KILT: a benchmark for knowledge intensive language tasks[C]//Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: ACL, 2021: 2523-2544.
[133] YANG Z L, QI P, ZHANG S Z, et al. HotpotQA: a dataset for diverse, explainable multi-hop question answering[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2018: 2369-2380.
[134] THORNE J, VLACHOS A, CHRISTODOULOPOULOS C, et al. FEVER: a large-scale dataset for fact extraction and verification[C]//Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1. Stroudsburg: ACL, 2018: 809-819.
[135] DINAN E, ROLLER S, SHUSTER K, et al. Wizard of Wikipedia: knowledge-powered conversational agents[J]. arXiv:1811.01241, 2018.
[136] DEYOUNG J, JAIN S, RAJANI N F, et al. ERASER: a benchmark to evaluate rationalized NLP models[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2020: 4443-4458.
[137] ZHANG S, LIU X D, LIU J J, et al. ReCoRD: bridging the gap between human and machine commonsense reading comprehension[J]. arXiv:1810.12885, 2018.
[138] GOTTSCHALK S, DEMIDOVA E. EventKG: a multilingual event-centric temporal knowledge graph[C]//Proceedings of the 15th International Conference on the Semantic Web. Cham: Springer, 2018: 272-287.
[139] HUANG J, SHAO H Y, CHANG K C, et al. Understanding jargon: combining extraction and generation for definition modeling[C]//Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2022: 3994-4004.
[140] KWIATKOWSKI T, PALOMAKI J, REDFIELD O, et al. Natural questions: a benchmark for question answering research[J]. Transactions of the Association for Computational Linguistics, 2019, 7: 453-466.
[141] LIANG X, SONG S C, NIU S M, et al. UHGEval: benchmarking the hallucination of Chinese large language models via unconstrained generation[J]. arXiv:2311.15296, 2023.
[142] KAMALLOO E, THAKUR N, LASSANCE C, et al. Resources for brewing BEIR: reproducible reference models and an official leaderboard[J]. arXiv:2306.07471, 2023.
[143] KASAI J, SAKAGUCHI K, LE B R, et al. RealTime QA: what’s the answer right now?[C]//Advances in Neural Information Processing Systems, 2024 .
[144] FISCH A, TALMOR A, JIA R, et al. MRQA 2019 shared task: evaluating generalization in reading comprehension[C]//Proceedings of the 2nd Workshop on Machine Reading for Question Answering. Stroudsburg: ACL, 2019: 1-13.
[145] ZHENG L, CHIANG W L, SHENG Y, et al. Judging LLM-as-a-judge with MT-bench and Chatbot Arena[C]//Proceedings of the 37th International Conference on Neural Information Processing Systems, 2023: 46595-46623.
[146] GIENAPP L, SCELLS H, DECKERS N, et al. Evaluating generative ad hoc information retrieval[C]//Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2024: 1916-1929.
[147] PAPINENI K, ROUKOS S, WARD T, et al. BLEU: a method for automatic evaluation of machine translation[C]//Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, 2002: 311-318.
[148] FEI Z, SHEN X, ZHU D, et al. LawBench: benchmarking legal knowledge of large language models[J]. arXiv:2309. 16289, 2023.
[149] MULUDI K, FITRIA K M, TRILOKA J, et al. Retrieval-augmented generation approach: document question answering using large language model[J]. International Journal of Advanced Computer Science and Applications, 2024, 15(3): 776-785.
[150] KURATOV Y, BULATOV A, ANOKHIN P, et al. In search of needles in a 11M haystack: recurrent memory finds what LLMs miss[J]. arXiv:2402.10790, 2024.
[151] EDGE D, TRINH H, CHENG N, et al. From local to global: a graph RAG approach to query-focused summarization[J]. arXiv:2404.16130, 2024.
[152] YASUUNAGA M, AGHAJANYAN A, SHI W, et al. Retrieval-augmented multimodal language modeling[C]//Proceedings of the International Conference on Machine Learning, 2023: 39755-39769.
[153] LI J, LI D, SAVARESE S, et al. BLIP-2: bootstrapping language-image pre-training with frozen image encoders and large language models[C]//Proceedings of the International Conference on Machine Learning, 2023: 19730-19742.
[154] ZHU W R, YAN A, LU Y J, et al. Visualize before you write: imagination-guided open-ended text generation[C]//Findings of the Association for Computational Linguistics: EACL 2023. Stroudsburg: ACL, 2023: 78-92.
[155] ZHAO J M, HAFFARI G, SHAREGHI E. Generating synthetic speech from SpokenVocab for speech translation[C]//Findings of the Association for Computational Linguistics: EACL 2023. Stroudsburg: ACL, 2023: 1975-1981.
[156] CHAN D M, GHOSH S, RASTROW A, et al. Using external off-policy speech-to-text mappings in contextual end-to-end automated speech recognition[J]. arXiv:2301.02736, 2023.
[157] YANG A, NAGRANI A, SEO P H, et al. Vid2Seq: large-scale pretraining of a visual language model for dense video captioning[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 10714-10726.
[158] NASHID N, SINTAHA M, MESBAH A. Retrieval-based prompt selection for code-related few-shot learning[C]//Proceedings of the 2023 IEEE/ACM 45th International Conference on Software Engineering. Piscataway: IEEE, 2023: 2450-2462.
[159] DU Y, LI S, TORRALBA A, et al. Improving factuality and reasoning in language models through multiagent debate[J]. arXiv:2305.14325, 2023.
[160] LIANG T, HE Z W, JIAO W X, et al. Encouraging divergent thinking in large language models through multi-agent debate[J]. arXiv:2305.19118, 2023.
[161] CHEN J C, SAHA S, BANSAL M. ReConcile: round-table conference improves reasoning via consensus among diverse LLMs[J]. arXiv:2309.13007, 2023.
[162] WEI J, WANG X Z, SCHUURMANS D, et al. Chain-of-thought prompting elicits reasoning in large language models[C]//Advances in Neural Information Processing Systems, 2022: 24824-24837.