
计算机工程与应用 ›› 2025, Vol. 61 ›› Issue (13): 1-25.DOI: 10.3778/j.issn.1002-8331.2410-0088
刘雪颖,云静,李博,史晓国,张钰莹
出版日期:2025-07-01
发布日期:2025-06-30
LIU Xueying, YUN Jing, LI Bo, SHI Xiaoguo, ZHANG Yuying
Online:2025-07-01
Published:2025-06-30
摘要: 最近,智能体代理能在复杂任务中提供高效的解决方案,在工业界备受关注。作为智能体代理的常见范式之一,检索增强生成(retrieval-augmented generation,RAG)旨在结合信息检索和内容生成技术增强生成响应质量,已逐步成为研究的重点。在对国内外检索增强生成方法研究的基础上,阐述了RAG的基本概念及工作流程,归纳了技术现状,分析了现有RAG技术的优缺点,梳理了现有评估指标、数据集和基准。最后探讨了RAG技术在未来应用场景下所面临的挑战,并展望了其未来发展方向。
刘雪颖, 云静, 李博, 史晓国, 张钰莹. 基于大型语言模型的检索增强生成综述[J]. 计算机工程与应用, 2025, 61(13): 1-25.
LIU Xueying, YUN Jing, LI Bo, SHI Xiaoguo, ZHANG Yuying. Survey of Retrieval-Augmented Generation Based on Large Language Models[J]. Computer Engineering and Applications, 2025, 61(13): 1-25.
| [1] QIAN J L, JIN Z Y, ZHANG Q, et al. A liver cancer question-answering system based on next-generation intelligence and the large model med-PaLM 2[J]. International Journal of Computer Science and Information Technology, 2024, 2(1): 28-35. [2] YUE S B, CHEN W, WANG S Y, et al. DISC-LawLLM: fine-tuning large language models for intelligent legal services[J]. arXiv:2309.11325, 2023. [3] 房晓楠. 松鼠AI的“AI+智适应教育” 之路该如何走?[J]. 机器人产业, 2019(1): 80-84. FANG X N. How should squirrel AI take the road of “AI+intellectual adaptation education”?[J]. Robot Industry, 2019(1): 80-84. [4] NIKDAN M, TABESH S, CMCEVIC E, et al. RoSA: accurate parameter-efficient fine-tuning via robust adaptation[C]//Proceedings of the 41st International Conference on Machine Learning, 2024. [5] LI J T, LIU Y Q, FAN W Q, et al. Empowering molecule discovery for molecule-caption translation with large language models: a ChatGPT perspective[J]. IEEE Transactions on Knowledge and Data Engineering, 2024, 36(11): 6071-6083. [6] SHUSTER K, POFF S, CHEN M Y, et al. Retrieval augmentation reduces hallucination in conversation[C]//Findings of the Association for Computational Linguistics: EMNLP 2021. Stroudsburg: ACL, 2021: 3784-3803. [7] ZHAO T, WALLACE E, FENG S, et al. Calibrate before use: improving few-shot performance of language models[C]//Proceedings of the International Conference on Machine Learning, 2021: 12697-12706. [8] CHENG X, LUO D, CHEN X, et al. Lift yourself up: retrieval-augmented text generation with self-memory[C]//Advances in Neural Information Processing Systems, 2024. [9] GAO L, MADAAN A, ZHOU S, et al. PAL: program-aided language models[C]//Proceedings of the International Conference on Machine Learning , 2023: 10764-10799. [10] HUANG L, YU W J, MA W T, et al. A survey on hallucination in large language models: principles, taxonomy, challenges, and open questions[J]. ACM Transactions on Information Systems, 2025, 43(2): 1-55. [11] IZACARD G, LEWIS P, LOMELI M, et al. Atlas: few-shot learning with retrieval augmented language models[J]. Journal of Machine Learning Research, 2023, 24(251): 1-43. [12] WU Y, RABE M N, HUTCHINS D L, et al. Memorizing transformers[C]//Proceedings of the International Conference on Learning Representations, 2022. [13] GUU K, LEE K, TUNG Z, et al. REALM: retrieval-augmented language model pre-training[C]//Proceedings of the International Conference on Machine Learning, 2020: 3929-3938. [14] LEWIS P, PERES E, PIKTUS A, et al. Retrieval-augmented generation for knowledge-intensive nlp tasks[C]//Advances in Neural Information Processing Systems, 2020: 9459-9474. [15] BORGEAUD S, MENSCH A, HOFFMANN J, et al. Improving language models by retrieving from trillions of tokens[C]//Proceedings of the International Conference on Machine Learning, 2022: 2206-2240. [16] IZACARD G, GRAVE E. Leveraging passage retrieval with generative models for open domain question answering[C]//Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume. Stroudsburg: ACL, 2021: 874-880. [17] KHANDELWAL U, LEVY O, JURAFSKY D, et al. Generalization through memorization: nearest neighbor language models[C]//Proceedings of the International Conference on Learning Representations, 2019. [18] HE J X, NEUBIG G, BERG-KIRKPATRICK T. Efficient nearest neighbor language models[C]//Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2021: 5703-5714. [19] HE Z Y, ZHONG Z X, CAI T L, et al. REST: retrieval-based speculative decoding[C]//Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: ACL, 2024: 1582-1595. [20] BANG F. GPTCache: an open-source semantic cache for LLM applications enabling faster answers and cost savings[C]//Proceedings of the 3rd Workshop for Natural Language Processing Open Source Software, 2023: 212-218. [21] ZHAO P H, ZHANG H L, YU Q H, et al. Retrieval-augmented generation for AI-generated content: a survey[J]. arXiv:2402. 19473, 2009. [22] GAO Y F, XIONG Y, GAO X Y, et al. Retrieval-augmented generation for large language models: a survey[J]. arXiv:2312. 10997, 2023. [23] FAN W Q, DING Y J, NING L B, et al. A survey on RAG meeting LLMs: towards retrieval-augmented large language models[C]//Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. New York: ACM, 2024: 6491-6501. [24] CHEN J W, LIN H Y, HAN X P, et al. Benchmarking large language models in retrieval-augmented generation[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2024, 38(16): 17754-17762. [25] ILIN I. Advanced RAG techniques: an illustrated overview[EB/OL]. [2024-09-25]. https://github.com/NirDiamant/RAG_TECHNIQUES. [26] ZHENG H S, MISHRA S, CHEN X, et al. Take a step back: evoking reasoning via abstraction in large language models[C]//Proceedings of the 12th International Conference on Learning Representations, 2024. [27] WANG S H, XU Y C, FANG Y W, et al. Training data is more valuable than you think: a simple and effective method by retrieving from training data[C]//Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2022: 3170-3179. [28] MA X B, GONG Y Y, HE P C, et al. Query rewriting in retrieval-augmented large language models[C]//Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2023: 5303-5315. [29] KHATTAB O, SANTHANAM K, LI X L, et al. Demonstrate-search-predict: composing retrieval and language models for knowledge-intensive NLP[J]. arXiv:2212.14024, 2022. [30] WANG Y L, LI P, SUN M S, et al. Self-knowledge guided retrieval augmentation for large language models[C]//Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023. [31] JEONG S, BAEK J, CHO S, et al. Adaptive-RAG: learning to adapt retrieval-augmented large language models through question complexity[C]//Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: ACL, 2024: 7036-7050. [32] CHEN T, WANG H W, CHEN S H, et al. Dense X retrieval: what retrieval granularity should we use?[C]//Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2024: 15159-15177. [33] ZHA L Y, ZHOU J L, LI L Y, et al. TableGPT: towards unifying tables, nature language and commands into one GPT[J]. arXiv:2307.08674, 2023. [34] GAUR M, GUNARATNA K, SRINIVASAN V, et al. ISEEQ: information seeking question generation using dynamic meta-information retrieval and knowledge graphs[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2022, 36(10): 10672-10680. [35] YANG L Y, CHEN H Y, LI Z, et al. Give us the facts: enhancing large language models with knowledge graphs for fact-aware language modeling[J]. IEEE Transactions on Knowledge and Data Engineering, 2024, 36(7): 3091-3110. [36] HE X, TIAN Y, SUN Y, et al. G-Retriever: retrieval-augmented generation for textual graph understanding and question answering[J]. arXiv:2402.07630, 2024. [37] TEJA R. Evaluating the ideal chunk size for a RAG system using LlamaIndex[EB/OL]. [2024-10-01]. https://www.llamaindex.ai/blog/evaluating-the-ideal-chunk-size-for-a-ragsystem-using-llamaindex-6207e5d3fec5. [38] YANG S. Advanced RAG 01: small-tobig retrieval[EB/OL]. (2023-11-05)[2024-10-01]. https://towardsdatascience.com/advanced-rag-01-small-to-big-retrieval-172181b396d4. [39] QIAN H J, LIU Z, MAO K L, et al. Grounding language model with chunking-free in-context retrieval[J]. arXiv:2402. 09760, 2024. [40] ZHAO J H, JI Z Y, FENG Y C, et al. Meta-chunking: learning efficient text segmentation via logical perception[J]. arXiv:2410.12788, 2024. [41] LIANG Y, JIANG Z X, YIN D, et al. RAAT: relation-augmented attention transformer for relation modeling in document-level event extraction[C]//Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: ACL, 2022: 4985-4997. [42] SUN Z, WANG X, TAY Y, et al. Recitation-augmented language models[J]. arXiv:2210.01296, 2022. [43] WANG K X, REIMERS N, GUREVYCH I. DAPR: a benchmark on document-aware passage retrieval[J]. arxiv:2305. 13915, 2023. [44] KIM J, NAM J, MO S, et al. SuRe: summarizing retrievals using answer candidates for open-domain QA of LLMs[J]. arXiv:2404.13081, 2024. [45] DOOSTMOHAMMADI E, NORLUND T, KUHLMANN M, et al. Surface-based retrieval reduces perplexity of retrieval-augmented language models[C]//Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2023: 521-529. [46] XIAO S T, LIU Z, ZHANG P T, et al. C-pack: packed resources for general Chinese embeddings[C]//Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2024: 641-649. [47] LIU Z, XIAO S T, SHAO Y X, et al. RetroMAE-2: duplex masked auto-encoder for pre-training retrieval-oriented language models[C]//Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2023: 2635-2648. [48] SHI W J, MIN S, YASUNAGA M, et al. REPLUG: retrieval-augmented black-box language models[C]//Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: ACL, 2024: 8371-8384. [49] ZHANG L X, YU Y, WANG K, et al. ARL2: aligning retrievers for black-box large language models via self-guided adaptive relevance labeling[J]. arXiv:2402.13542, 2024. [50] DAI Z, ZHAO V Y, MA J, et al. Promptagator: few-shot dense retrieval from 8 examples[J]. arXiv:2209.11755, 2022. [51] LUO K, LIU Z, XIAO S T, et al. BGE landmark embedding: a chunking-free embedding method for retrieval augmented long-context large language models[J]. arXiv:2402. 11573, 2024. [52] LI X M, LI J. AnglE-optimized text embeddings[J]. arXiv: 2309.12871, 2023. [53] YOON S, CHOI E, KIM J, et al. ListT5: listwise reranking with fusion-in-decoder improves zero-shot retrieval[J]. arXiv:2402.15838, 2024. [54] GAO L Y, MA X G, LIN J, et al. Precise zero-shot dense retrieval without relevance labels[C]//Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2023: 1762-1777. [55] SARTHI P, ABDULLAH S, TULI A, et al. RAPTOR: recursive abstractive processing for tree-organized retrieval[J]. arXiv:2401.18059, 2024. [56] WANG Y, LIPKA N, ROSSI R A, et al. Knowledge graph prompting for multi-document question answering[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2024, 38(17): 19206-19214. [57] RACKAUCKAS Z. RAG-Fusion: a new take on retrieval augmented generation[J]. International Journal on Natural Language Computing, 2024, 13(1): 37-47. [58] ZHOU D, SVHARLI N, HOU L, et al. Least-to-most prompting enables complex reasoning in large language models[J]. arXiv: 2205.10625, 2022. [59] PENG W J, LI G Y, JIANG Y, et al. Large language model based long-tail query rewriting in Taobao search[C]//Companion Proceedings of the ACM Web Conference 2024. New York: ACM, 2024: 20-28. [60] LI X, NIE E, LIANG S. From classification to generation: insights into crosslingual retrieval augmented ICL[J]. arXiv:2311.06595, 2023. [61] KARPUKHIN V, OGUZ B, MIN S, et al. Dense passage retrieval for open-domain question answering[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2020: 6769-6781. [62] CHENG D X, HUANG S H, BI J Y, et al. UPRISE: universal prompt retrieval for improving zero-shot evaluation[C]//Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2023: 12318-12337. [63] YOON J, CHEN Y F, ARIK S, et al. Search-adaptor: embedding customization for information retrieval[C]//Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2024: 12230-12247. [64] YANG H Y, LI Z T, ZHANG Y, et al. PRCA: fitting black-box large language models for retrieval question answering via pluggable reward-driven contextual adapter[C]//Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2023: 5364-5375. [65] YAN S Q, GU J C, ZHU Y, et al. Corrective retrieval augmented generation[J]. arXiv:2401.15884, 2024. [66] YU W, ITER D, WANG S, et al. Generate rather than retrieve: large language models are strong context generators[J]. arXiv:2209.10063, 2022. [67] LUO Z Y, XU C, ZHAO P, et al. Augmented large language models with parametric knowledge guiding[J]. arXiv:2305. 04757, 2023. [68] MA Y B, CAO Y X, HONG Y, et al. Large language model is not a good few-shot information extractor, but a good reranker for hard samples![C]//Findings of the Association for Computational Linguistics: EMNLP 2023. Stroudsburg: ACL, 2023: 10572-10601. [69] DONG J L, FATEMI B, PEROZZI B, et al. Don’t forget to connect! improving RAG with graph-based reranking[J]. arXiv:2405.18414, 2024. [70] YU Y, PING W, LIU Z, et al. RankRAG: unifying context ranking with retrieval-augmented generation in LLMs[J]. arXiv:2407.02485, 2024. [71] ANDERSON N, WILSON C, RICHARDSON S D. Lingua: addressing scenarios for live interpretation and automatic dubbing[C]//Proceedings of the 15th Biennial Conference of the Association for Machine Translation in the Americas (Volume 2: Users and Providers Track and Government Track), 2022: 202-209. [72] JIANG H Q, WU Q H, LUO X F, et al. LongLLMLingua: accelerating and enhancing LLMs in long context scenarios via prompt compression[J]. arXiv:2310.06839, 2023. [73] WANG Z R, ARAKI J, JIANG Z B, et al. Learning to filter context for retrieval-augmented generation[J]. arXiv:2311. 08377, 2023. [74] XU F Y, SHI W J, CHOI E. RECOMP: improving retrieval-augmented LMs with compression and selective augmentation[J]. arXiv:2310.04408, 2023. [75] KIM Y, KIM H J, PARK C, et al. Adaptive contrastive decoding in retrieval-augmented generation for handling noisy contexts[C]//Findings of the Association for Computational Linguistics: EMNLP 2024. Stroudsburg: ACL, 2024: 2421-2431. [76] ZHU K, FENG X C, DU X Y, et al. An information bottleneck perspective for effective noise filtering on retrieval-augmented generation[J]. arXiv:2406.01549, 2024. [77] CUI J, LI Z, YAN Y, et al. Chatlaw: open-source legal large language model with integrated external knowledge bases[J]. arXiv:2306.16092, 2023. [78] LI W Y, LI J A, RAMOS R, et al. Understanding retrieval robustness for retrieval-augmented image captioning[J]. arXiv: 2406.02265, 2024. [79] LI X Z, LIU Z H, XIONG C Y, et al. Structure-aware language model pretraining improves dense retrieval on structured data[C]//Findings of the Association for Computational Linguistics: ACL 2023. Stroudsburg: ACL, 2023: 11560-11574. [80] SHI T Y, LI L Z, LIN Z J, et al. Dual-feedback knowledge retrieval for task-oriented dialogue systems[C]//Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2023: 6566-6580. [81] LIN X V, CHEN X, CHEN M, et al. RA-DIT: retrieval-augmented dual instruction tuning[J]. arXiv:2310.01352, 2023. [82] ROSSET C, CHUNG H L, QIN G H, et al. Researchy questions: a dataset of multi-perspective, decompositional questions for LLM web agents[J]. arXiv:2402.17896, 2024. [83] FENG J Z, TAO C Y, GENG X B, et al. Synergistic interplay between search and large language models for information retrieval[J]. arXiv:2305.07402, 2023. [84] SHAO Z H, GONG Y Y, SHEN Y L, et al. Enhancing retrieval-augmented large language models with iterative retrieval-generation synergy[C]//Findings of the Association for Computational Linguistics: EMNLP 2023. Stroudsburg: ACL, 2023: 9248-9274. [85] LI M F, MIAO S Q, LI P. Simple is effective: the roles of graphs and large language models in knowledge-graph-based retrieval-augmented generation[J]. arXiv:2410.20724, 2024. [86] TAN J J, DOU Z C, ZHU Y T, et al. Small models, big insights: leveraging slim proxy models to decide when and what to retrieve for LLMs[J]. arXiv:2402.12052, 2024. [87] YUE Z R, ZENG H M, SHANG L Y, et al. Retrieval augmented fact verification by synthesizing contrastive arguments[J]. arXiv:2406.09815, 2024. [88] WANG Z, LIU A, LIN H, et al. RAT: retrieval augmented thoughts elicit context-aware reasoning in long-horizon generation[J]. arXiv:2403.05313, 2024. [89] TRIVEDI H, BALASUBRAMANIAN N, KHOT T, et al. Interleaving retrieval with chain-of-thought reasoning for knowledge-intensive multi-step questions[C]//Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2023: 10014-10037. [90] KIM G, KIM S, JEON B, et al. Tree of clarifications: answering ambiguous questions with retrieval-augmented large language models[C]//Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2023: 996-1009. [91] LI X X, ZHAO R C, CHIA Y K, et al. Chain-of-knowledge: grounding large language models via dynamic knowledge adapting over heterogeneous sources[J]. arXiv:2305.13269, 2023. [92] ZHANG J W. Graph-ToolFormer: to empower LLMs with graph reasoning ability via prompt augmented by ChatGPT[J]. arXiv:2304.11116, 2023. [93] NAKANO R, HILTON J, BALAJI S, et al. WebGPT: browser-assisted question-answering with human feedback[J]. arXiv:2112.09332, 2021. [94] JIANG Z B, XU F, GAO L Y, et al. Active retrieval augmented generation[C]//Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2023: 7969-7992. [95] ASAI A, WU Z, WANG Y, et al. Self-RAG: learning to retrieve, generate, and critique through self-reflection[J]. arXiv:2310. 11511, 2023. [96] LU H Z, LIU Z X. Improving retrieval-augmented code comment generation by retrieving for generation[J]. arXiv: 2408.03623, 2024. [97] XIA Y, ZHOU J B, SHI Z H, et al. Improving retrieval augmented language model with self-reasoning[J]. arXiv:2407. 19813, 2024. [98] YANG D J, RAO J M, CHEN K Z, et al. IM-RAG: multi-round retrieval-augmented generation through learning inner monologues[C]//Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2024: 730-740. [99] WANG C R, LONG Q Q, XIAO M, et al. BioRAG: a RAG-LLM framework for biological question reasoning[J]. arXiv:2408.01107, 2024. [100] LIN X Y, WANG W J, LI Y Q, et al. Data-efficient fine-tuning for LLM-based recommendation[C]//Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2024: 365-374. [101] OVADIA O, BRIEF M, MISHAELI M, et al. Fine-tuning or retrieval? comparing knowledge injection in LLMs[J]. arXiv:2312.05934, 2023. [102] SOUDANI H, KANOULAS E, HASIBI F. Fine tuning vs. retrieval augmented generation for less popular knowledge[C]//Proceedings of the 2024 Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region. New York: ACM, 2024: 12-22. [103] LEE J, CHEN A, DAI Z Y, et al. Can long-context language models subsume retrieval, RAG, SQL, and more? [J]. arXiv:2406.13121, 2024. [104] JIANG X K, FANG Y, QIU R H, et al. TC-RAG: turing-complete RAG’s case study on medical LLM systems[J]. arXiv:2408.09199, 2024. [105] BARNETT S, KURNIAWAN S, THUDUMU S, et al. Seven failure points when engineering a retrieval augmented generation system[C]//Proceedings of the IEEE/ACM 3rd International Conference on AI Engineering - Software Engineering for AI. New York: ACM, 2024: 194-199. [106] ZHAO X, LU J, DENG C, et al. Beyond one-model-fits-all: a survey of domain specialization for large language models[J]. arXiv:2305.18703, 2023. [107] BLAGOJEVI V. Enhancing RAG pipelines in haystack: introducing DiversityRanker and LostInTheMiddleRanker[EB/OL]. (2023-08-09)[2024-10-07]. https://towardsdatascience.com/enhancing-rag-pipelines-in-haystack-45f14e2bc9f5. [108] SINGAL R, PATWA P, PATWA P, et al. Evidence-backed fact checking using RAG and few-shot in-context learning with LLMs[C]//Proceedings of the 7th Fact Extraction and Verification Workshop. Stroudsburg: ACL, 2024: 91-98. [109] LEE J S, HSIANG J. Patent claim generation by fine-tuning OpenAI GPT-2[J]. World Patent Information, 2020, 62: 101983. [110] PARK J S, O’BRIEN J, CAI C J, et al. Generative agents: interactive simulacra of human behavior[C]//Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology. New York: ACM, 2023: 1-22. [111] WU J D, ZHU J Y, QI Y L, et al. Medical graph RAG: towards safe medical large language model via graph retrieval-augmented generation[J]. arXiv:2408.04187, 2024. [112] DONG Y, MU R H, ZHANG Y H, et al. Safeguarding large language models: a survey[J]. arXiv:2406.02622, 2024. [113] ROFFO G. Exploring advanced large language models with LLMsuite[J]. arXiv:2407.12036, 2024. [114] LENG Q, UHLENHUTH K, POLYZOTIS A. Best practices for LLM evaluation of RAG applications[EB/OL]. (2023-09-12)[2024-10-07]. https://www.databricks.com/blog/LLM-auto-eval-best-practices-RAG. [115] ES S, JAMES J, ANKE L E, et al. RAGAs: automated evaluation of retrieval augmented generation[C]//Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations, 2024: 150-158. [116] LIU Y, HUANG L Z, LI S C, et al. RECALL: a benchmark for LLMs robustness against external counterfactual knowledge[J]. arXiv:2311.08147, 2023. [117] SAAD-FALCON J, KHATTAB O, POTTS C, et al. ARES: an automated evaluation framework for retrieval-augmented generation systems[C]//Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: ACL, 2024: 338-354. [118] TANG Y X, YANG Y. MultiHop-RAG: benchmarking retrieval-augmented generation for multi-hop queries[J]. arXiv:2401.15391, 2024. [119] LYU Y J, LI Z Y, NIU S M, et al. CRUD-RAG: a comprehensive Chinese benchmark for retrieval-augmented generation of large language models[J]. ACM Transactions on Information Systems, 2025, 43(2): 1-32. [120] XIONG G Z, JIN Q, LU Z Y, et al. Benchmarking retrieval-augmented generation for medicine[J]. arXiv:2402.13178, 2024. [121] WANG S, KHRAMTSOVA E, ZHUANG S Y, et al. FeB4RAG: evaluating federated search in the context of retrieval augmented generation[C]//Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2024: 763-773. [122] XU Z K, LI Y H, DING R X, et al. Let LLMs take on the latest challenges! A Chinese dynamic question answering benchmark[J]. arXiv:2402.19248, 2024. [123] WANG S T, LIU J N, SONG S R, et al. DomainRAG: a Chinese benchmark for evaluating domain-specific retrieval-augmented generation[J]. arXiv:2406.05654, 2024. [124] YU X D, CHENG H, LIU X D, et al. ReEval: automatic hallucination evaluation for retrieval-augmented large language models via transferable adversarial attacks[C]//Findings of the Association for Computational Linguistics: NAACL 2024. Stroudsburg: ACL, 2024: 1333-1351. [125] HOFST?TTER S, CHEN J C, RAMAN K, et al. FiD-Light: efficient and effective retrieval-augmented text generation[C]//Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2023: 1437-1447. [126] CUCONASU F, TRAPPOLINI G, SICILIANO F, et al. The power of noise: redefining retrieval for RAG systems[C]//Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2024: 719-729. [127] SALEMI A, ZAMANI H. Evaluating retrieval quality in retrieval-augmented generation[C]//Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2024: 2395-2400. [128] ZHU K L, LUO Y F, XU D L, et al. RAGEval: scenario specific RAG evaluation dataset generation framework[J]. arXiv:2408.01262, 2024. [129] RU D, QIU L, HU X, et al. RAGChecker: a fine-grained framework for diagnosing retrieval-augmented generation[J]. arXiv:2408.08067, 2024. [130] TU S Q, WANG Y C, YU J F, et al. R-Eval: a unified toolkit for evaluating domain knowledge of retrieval augmented large language models[C]//Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. New York: ACM, 2024: 5813-5824. [131] WANG A, PRUKSACHATKUN Y, NANGIA N, et al. SuperGLUE: a stickier benchmark for general-purpose language understanding systems[C]//Advances in Neural Information Processing Systems, 2019. [132] PETRONI F, PIKTUS A, FAN A, et al. KILT: a benchmark for knowledge intensive language tasks[C]//Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: ACL, 2021: 2523-2544. [133] YANG Z L, QI P, ZHANG S Z, et al. HotpotQA: a dataset for diverse, explainable multi-hop question answering[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2018: 2369-2380. [134] THORNE J, VLACHOS A, CHRISTODOULOPOULOS C, et al. FEVER: a large-scale dataset for fact extraction and verification[C]//Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1. Stroudsburg: ACL, 2018: 809-819. [135] DINAN E, ROLLER S, SHUSTER K, et al. Wizard of Wikipedia: knowledge-powered conversational agents[J]. arXiv:1811.01241, 2018. [136] DEYOUNG J, JAIN S, RAJANI N F, et al. ERASER: a benchmark to evaluate rationalized NLP models[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2020: 4443-4458. [137] ZHANG S, LIU X D, LIU J J, et al. ReCoRD: bridging the gap between human and machine commonsense reading comprehension[J]. arXiv:1810.12885, 2018. [138] GOTTSCHALK S, DEMIDOVA E. EventKG: a multilingual event-centric temporal knowledge graph[C]//Proceedings of the 15th International Conference on the Semantic Web. Cham: Springer, 2018: 272-287. [139] HUANG J, SHAO H Y, CHANG K C, et al. Understanding jargon: combining extraction and generation for definition modeling[C]//Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2022: 3994-4004. [140] KWIATKOWSKI T, PALOMAKI J, REDFIELD O, et al. Natural questions: a benchmark for question answering research[J]. Transactions of the Association for Computational Linguistics, 2019, 7: 453-466. [141] LIANG X, SONG S C, NIU S M, et al. UHGEval: benchmarking the hallucination of Chinese large language models via unconstrained generation[J]. arXiv:2311.15296, 2023. [142] KAMALLOO E, THAKUR N, LASSANCE C, et al. Resources for brewing BEIR: reproducible reference models and an official leaderboard[J]. arXiv:2306.07471, 2023. [143] KASAI J, SAKAGUCHI K, LE B R, et al. RealTime QA: what’s the answer right now?[C]//Advances in Neural Information Processing Systems, 2024 . [144] FISCH A, TALMOR A, JIA R, et al. MRQA 2019 shared task: evaluating generalization in reading comprehension[C]//Proceedings of the 2nd Workshop on Machine Reading for Question Answering. Stroudsburg: ACL, 2019: 1-13. [145] ZHENG L, CHIANG W L, SHENG Y, et al. Judging LLM-as-a-judge with MT-bench and Chatbot Arena[C]//Proceedings of the 37th International Conference on Neural Information Processing Systems, 2023: 46595-46623. [146] GIENAPP L, SCELLS H, DECKERS N, et al. Evaluating generative ad hoc information retrieval[C]//Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2024: 1916-1929. [147] PAPINENI K, ROUKOS S, WARD T, et al. BLEU: a method for automatic evaluation of machine translation[C]//Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, 2002: 311-318. [148] FEI Z, SHEN X, ZHU D, et al. LawBench: benchmarking legal knowledge of large language models[J]. arXiv:2309. 16289, 2023. [149] MULUDI K, FITRIA K M, TRILOKA J, et al. Retrieval-augmented generation approach: document question answering using large language model[J]. International Journal of Advanced Computer Science and Applications, 2024, 15(3): 776-785. [150] KURATOV Y, BULATOV A, ANOKHIN P, et al. In search of needles in a 11M haystack: recurrent memory finds what LLMs miss[J]. arXiv:2402.10790, 2024. [151] EDGE D, TRINH H, CHENG N, et al. From local to global: a graph RAG approach to query-focused summarization[J]. arXiv:2404.16130, 2024. [152] YASUUNAGA M, AGHAJANYAN A, SHI W, et al. Retrieval-augmented multimodal language modeling[C]//Proceedings of the International Conference on Machine Learning, 2023: 39755-39769. [153] LI J, LI D, SAVARESE S, et al. BLIP-2: bootstrapping language-image pre-training with frozen image encoders and large language models[C]//Proceedings of the International Conference on Machine Learning, 2023: 19730-19742. [154] ZHU W R, YAN A, LU Y J, et al. Visualize before you write: imagination-guided open-ended text generation[C]//Findings of the Association for Computational Linguistics: EACL 2023. Stroudsburg: ACL, 2023: 78-92. [155] ZHAO J M, HAFFARI G, SHAREGHI E. Generating synthetic speech from SpokenVocab for speech translation[C]//Findings of the Association for Computational Linguistics: EACL 2023. Stroudsburg: ACL, 2023: 1975-1981. [156] CHAN D M, GHOSH S, RASTROW A, et al. Using external off-policy speech-to-text mappings in contextual end-to-end automated speech recognition[J]. arXiv:2301.02736, 2023. [157] YANG A, NAGRANI A, SEO P H, et al. Vid2Seq: large-scale pretraining of a visual language model for dense video captioning[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 10714-10726. [158] NASHID N, SINTAHA M, MESBAH A. Retrieval-based prompt selection for code-related few-shot learning[C]//Proceedings of the 2023 IEEE/ACM 45th International Conference on Software Engineering. Piscataway: IEEE, 2023: 2450-2462. [159] DU Y, LI S, TORRALBA A, et al. Improving factuality and reasoning in language models through multiagent debate[J]. arXiv:2305.14325, 2023. [160] LIANG T, HE Z W, JIAO W X, et al. Encouraging divergent thinking in large language models through multi-agent debate[J]. arXiv:2305.19118, 2023. [161] CHEN J C, SAHA S, BANSAL M. ReConcile: round-table conference improves reasoning via consensus among diverse LLMs[J]. arXiv:2309.13007, 2023. [162] WEI J, WANG X Z, SCHUURMANS D, et al. Chain-of-thought prompting elicits reasoning in large language models[C]//Advances in Neural Information Processing Systems, 2022: 24824-24837. |
| [1] | 董磊, 吴福居, 史健勇, 潘龙飞. 基于大语言模型的施工安全多模态知识图谱的构建与应用[J]. 计算机工程与应用, 2025, 61(9): 325-333. |
| [2] | 任海玉, 刘建平, 王健, 顾勋勋, 陈曦, 张越, 赵昌顼. 基于大语言模型的智能问答系统研究综述[J]. 计算机工程与应用, 2025, 61(7): 1-24. |
| [3] | 王敬凯, 秦董洪, 白凤波, 李路路, 孔令儒, 徐晨. 语音识别与大语言模型融合技术研究综述[J]. 计算机工程与应用, 2025, 61(6): 53-63. |
| [4] | 陶江垚, 奚雪峰, 盛胜利, 崔志明, 左严. 结构化思维提示增强大语言模型推理能力综述[J]. 计算机工程与应用, 2025, 61(6): 64-83. |
| [5] | 江双五, 张嘉玮, 华连生, 杨菁林. 基于大模型检索增强生成的气象数据库问答模型实现[J]. 计算机工程与应用, 2025, 61(5): 113-121. |
| [6] | 苑中旭, 李理, 何凡, 杨秀, 韩东轩. 融合思维链与知识图谱的中医问答模型[J]. 计算机工程与应用, 2025, 61(4): 158-166. |
| [7] | 李玥, 洪海蓝, 李文林, 杨涛. 大语言模型构建鼻炎医案知识图谱的应用研究[J]. 计算机工程与应用, 2025, 61(4): 167-175. |
| [8] | 籍欣萌, 昝红英, 崔婷婷, 张坤丽. 大模型在垂直领域应用的现状与挑战[J]. 计算机工程与应用, 2025, 61(12): 1-11. |
| [9] | 王昱婷, 陈波, 闫强, 范意兴, 余智华, 郭嘉丰. 基于问题导向提示学习和多路推理的检索增强生成问答[J]. 计算机工程与应用, 2025, 61(12): 120-128. |
| [10] | 姚奕, 陈朝阳, 杜晓明, 姚天磊, 李青尚, 孙鸣蔚. 多模态知识图谱构建技术及其在军事领域的应用综述[J]. 计算机工程与应用, 2024, 60(22): 18-37. |
| [11] | 张钦彤, 王昱超, 王鹤羲, 王俊鑫, 陈海. 大语言模型微调技术的研究综述[J]. 计算机工程与应用, 2024, 60(17): 17-33. |
| [12] | 高帅, 奚雪峰, 郑倩, 崔志明, 盛胜利. 面向数据可视化的自然语言接口研究综述[J]. 计算机工程与应用, 2024, 60(15): 24-41. |
| [13] | 于丰瑞. 网络威胁技战术情报自动化识别提取研究综述[J]. 计算机工程与应用, 2024, 60(13): 1-22. |
| 阅读次数 | ||||||
|
全文 |
|
|||||
|
摘要 |
|
|||||