
Computer Engineering and Applications ›› 2025, Vol. 61 ›› Issue (20): 75-104.DOI: 10.3778/j.issn.1002-8331.2410-0452
• Research Hotspots and Reviews • Previous Articles Next Articles
ZHANG Yuying, YUN Jing, LIU Xueying, SHI Xiaoguo
Online:2025-10-15
Published:2025-10-15
张钰莹,云静,刘雪颖,史晓国
ZHANG Yuying, YUN Jing, LIU Xueying, SHI Xiaoguo. Survey of Feedback-Based Content and Behavior Alignment Methods for Large Language Model[J]. Computer Engineering and Applications, 2025, 61(20): 75-104.
张钰莹, 云静, 刘雪颖, 史晓国. 基于反馈的大语言模型内容与行为对齐方法综述[J]. 计算机工程与应用, 2025, 61(20): 75-104.
Add to citation manager EndNote|Ris|BibTeX
URL: http://cea.ceaj.org/EN/10.3778/j.issn.1002-8331.2410-0452
| [1] OUYANG L, WU J, JIANG X, et al. Training language models to follow instructions with human feedback[C]//Advances in Neural Information Processing Systems, 2022: 27730-27744. [2] RAFAILOV R, SHARMA A, MITCHELL E, et al. Direct preference optimization: your language model is secretly a reward model[J]. arXiv:2305.8290, 2023. [3] DAI J, PAN X, SUN R, et al. SAFE RLHF: safe reinforcement learning from human feedback[J]. arXiv:2310.12773, 2023. [4] ZHANG T Y, WANG J A, LI Z X, et al. MusTQ: a temporal knowledge graph question answering dataset for multi-step temporal reasoning[C]//Proceedings of the Association for Computational Linguistics. Stroudsburg: ACL, 2024: 11688-11699. [5] GAO Y, QIAO L, KAN Z, et al. Two-stage generative question answering on temporal knowledge graph using large language models[J]. arXiv:2402.16568, 2024. [6] BAI Y, KADAVATH S, KUNDU S, et al. Constitutional AI: harmlessness from AI feedback[J]. arXiv:2212.08073, 2022. [7] YAO J, YI X, WANG X, et al. Value FULCRA: mapping large language models to the multidimensional spectrum of basic human values[J]. arXiv:2311.10766, 2023. [8] RAJANI N F, MCCANN B, XIONG C, et al. Explain yourself! leveraging language models for commonsense reasoning[J]. arXiv:1906.02361, 2019. [9] DUAN Y, TANG F, WU K, et al. The large language model (LLM) bias evaluation (linguistic bias)—DIKWP team international standard evaluation[C]//Proceedings of the World Conference on Artificial Consciousness, 2023. [10] WEI J, WANG X, SCHUURMANS D, et al. Chain-of-thought prompting elicits reasoning in large language models[C]//Adv-ances in Neural Information Processing Systems, 2022: 24824-24837. [11] HANCOCK B, BORDES A, MAZARE P E, et al. Learning from dialogue after deployment: feed yourself, chatbot![J]. arXiv:1901.05415, 2019. [12] ZIEGLER D M, STIENNON N, WU J, et al. Fine-tuning language models from human preferences[J]. arXiv:1909. 08593, 2019. [13] STIENNON N, OUYANG L, WU J, et al. Learning to summarize with human feedback[C]//Advances in Neural Information Processing Systems, 2020: 3008-3021. [14] GLAESE A, MCALEESE N, TR?BACZ M, et al. Improving alignment of dialogue agents via targeted human judgements[J]. arXiv:2209.14375, 2022. [15] WU J, OUYANG L, ZIEGLER D M, et al. Recursively summarizing books with human feedback[J]. arXiv:2109.10862, 2021. [16] MIN S, KRISHNA K, LYU X, et al. FActScore: fine-grained atomic evaluation of factual precision in long form text generation[J]. arXiv:2305.14251, 2023. [17] AKYüREK A F, AKYüREK E, MADAAN A, et al. RL4F: generating natural language feedback with reinforcement learning for repairing model outputs[J]. arXiv:2305.08844, 2023. [18] LE H, WANG Y, GOTMARE A D, et al. CodeRL: mastering code generation through pretrained models and deep reinforcement learning[C]//Advances in Neural Information Processing Systems, 2022: 21314-21328. [19] GERO Z, SINGH C, CHENG H, et al. Self-verification impr-oves few-shot clinical information extraction[J]. arXiv:2306. 00024, 2023. [20] LIANG Y, ZHANG G, QU X, et al. I-SHEEP: self-alignment of LLM from scratch through an iterative self-enhancement paradigm[J]. arXiv:2408.08072, 2024. [21] LI J, MILLER A H, CHOPRA S, et al. Dialogue learning with human-in-the-loop[J]. arXiv:1611.09823, 2016. [22] XU W, CAI D, ZHANG Z, et al. Reasons to reject? aligning language models with judgments[J]. arXiv:2312.14591, 2023. [23] KREUTZER J, KHADIVI S, MATUSOV E, et al. Can neural machine translation be improved with user feedback?[J]. arXiv:1804.05958, 2018. [24] SHI W, LI Y, SAHAY S, et al. Refine and imitate: reducing repetition and inconsistency in persuasion dialogues via reinforcement learning and human demonstration[J]. arXiv:2012.15375, 2020. [25] KORBAK T, SHI K, CHEN A, et al. Pretraining language models with human preferences[C]//Proceedings of the International Conference on Machine Learning, 2023: 17506-17533. [26] JAQUES N, GHANDEHARIOUN A, SHEN J H, et al. Way off-policy batch deep reinforcement learning of implicit human preferences in dialog[J]. arXiv:1907.00456, 2019. [27] SCHEURER J, CAMPOS J A, KORBAK T, et al. Training language models with language feedback at scale[J]. arXiv:2303.16755, 2023. [28] LIN C, REN J, HE G, et al. Tree-based hard attention with self-motivation for large language models[J]. arXiv:2402. 08874, 2024. [29] LEE K, HWANG D, PARK S, et al. Reinforcement learning from reflective feedback (RLRF): aligning and improving LLMs via fine-grained self-reflection[J]. arXiv:2403.14238, 2024. [30] XU H, ZHAO N, YANG L, et al. ReLearn: unlearning via learning for large language models[J]. arXiv:2502.11190, 2025. [31] YANG Z, PANG T, FENG H, et al. Self-distillation bridges distribution gap in language model fine-tuning[J]. arXiv:2402. 13669, 2024. [32] FREITAG M, GRANGIER D, TAN Q J, et al. High quality rather than high model probability: minimum Bayes risk decoding with neural metrics[J]. Transactions of the Association for Computational Linguistics, 2022, 10: 811-825. [33] HU C, GE Y, MA X, et al. RankPrompt: step-by-step comparisons make language models better reasoners[J]. arXiv: 2403.12373, 2024. [34] LIU A, BAI H, LU Z, et al. Direct large language model alignment through self-rewarding contrastive prompt distillation[J]. arXiv:2402.11907, 2024. [35] LI Y F, LIN Z Q, ZHANG S Z, et al. Making language models better reasoners with step-aware verifier[C]//Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2023: 5315-5333. [36] DENG Y, ZHAO Y, LI M, et al. Gotcha! don't trick me with unanswerable questions! self-aligning large language models for responding to unknown questions[J]. arXiv:2402.15062, 2024. [37] HAO S, GU Y, MA H, et al. Reasoning with language model is planning with world model[J]. arXiv:2305.14992, 2023. [38] KHALIFA M, LOGESWARAN L, LEE M, et al. Discriminator-guided multi-step reasoning with language models[J]. arXiv:2305.14934, 2023. [39] DU L, SUN Z, DING X, et al. Causal-guided active learning for debiasing large language models[J]. arXiv:2408.12942, 2024. [40] YAO S, YU D, ZHAO J, et al. Tree of thoughts: deliberate problem solving with large language models[C]//Advances in Neural Information Processing Systems, 2024: 1049-5258. [41] LI Y, WEI F, ZHAO J, et al. RAIN: your language models can align themselves without fine-tuning[J]. arXiv:2309.07124, 2023. [42] MADAAN A, TANDON N, GUPTA P, et al. Self-refine: iterative refinement with self-feedback[C]//Proceedings of the 37th International Conference on Neural Information Processing Systems, 2024: 46534-46594. [43] LIANG X, SONG S, ZHENG Z, et al. Internal consistency and self-feedback in large language models: a survey[J]. arXiv:2407.14507, 2024. [44] WANG Z, HOU L, LU T, et al. Enable language models to implicitly learn self-improvement from data[J]. arXiv:2310. 00898, 2023. [45] YUKSEKGONUL M, BIANCHI F, BOEN J, et al. Optimizing generative AI by backpropagating language model feedback[J]. Nature, 2025, 639: 609-616. [46] SHINN N, CASSANO F, GOPINATH A, et al. Reflexion: language agents with verbal reinforcement learning[J]. arXiv:2303.11366, 2023. [47] KUMAR A, ZHUANG V, AGARWAL R, et al. Training language models to self-correct via reinforcement learning[J]. arXiv:2409.12917, 2024. [48] WANG B, CHEN W, PEI H, et al. DecodingTrust: a comprehensive assessment of trustworthiness in GPT models[C]//Proceedings of the 37th International Conference on Neural Information Processing Systems, 2023: 31232 -31339. [49] GOLOVNEVA O, CHEN M, POFF S, et al. ROSCOE: a suite of metrics for scoring step-by-step reasoning[J]. arXiv:2212.07919, 2022. [50] PAN L, ALBALAK A, WANG X, et al. Logic-LM: empowering large language models with symbolic solvers for faithful logical reasoning[J]. arXiv:2305.12295, 2023. [51] DATHATHRI S, MADOTTO A, LAN J, et al. Plug and play language models: a simple approach to controlled text generation[J]. arXiv:1912.02164, 2019. [52] GAO L, DAI Z, PASUPAT P, et al. RARR: researching and revising what language models say, using language models[J]. arXiv:2210.08726, 2022. [53] PENG B, GALLEY M, HE P, et al. Check your facts and try again: improving large language models with external knowledge and automated feedback[J]. arXiv:2302.12813, 2023. [54] XU Q, LI Y, XIA H, et al. Enhancing tool retrieval with iterative feedback from large language models[J]. arXiv:2406. 17465, 2024. [55] ZHANG J, HOU Z, LYU X, et al. LongReward: improving long-context large language models with AI feedback[J]. arXiv:2410.21252, 2024. [56] GOU Z, SHAO Z, GONG Y, et al. CRITIC: large language models can self-correct with tool-interactive critiquing[J]. arXiv:2305.11738, 2023. [57] CHERN I, CHERN S, CHEN S, et al. FacTool: factuality detection in generative AI--a tool augmented framework for multi-task and multi-domain scenarios[J]. arXiv:2307.13528, 2023. [58] KHATTAB O, SANTHANAM K, LI X L, et al. Demonstrate-search-predict: composing retrieval and language models for knowledge-intensive NLP[J]. arXiv:2212.14024, 2022. [59] WANG X, YANG Q, QIU Y, et al. KnowledGPT: enhancing large language models with retrieval and storage access on knowledge bases[J]. arXiv:2308.11761, 2023. [60] JIANG Z, XU F F, GAO L, et al. Active retrieval augmented generation[J]. arXiv:2305.06983, 2023. [61] ASAI A, WU Z, WANG Y, et al. Self-RAG: learning to ret-rieve, generate, and critique through self-reflection[J]. arXiv:2310.11511, 2023. [62] LI R, PATEL T, DU X. PRD: peer rank and discussion improve large language model based evaluations[J]. arXiv:2307.02762, 2023. [63] GALLEGO V. Refined direct preference optimization with synthetic data for behavioral alignment of LLMs[J]. arXiv:2402.08005, 2024. [64] COHEN R, HAMRI M, GEVA M, et al. LM vs LM: detecting factual errors via cross examination[J]. arXiv:2305.13281, 2023. [65] CRESWELL A, SHANAHAN M. Faithful reasoning using large language models[J]. arXiv:2208.14271, 2022. [66] FU Y, PENG H, KHOT T, et al. Improving language model negotiation with self-play and in-context learning from ai feedback[J]. arXiv:2305.10142, 2023. [67] DU Y, LI S, TORRALBA A, et al. Improving factuality and reasoning in language models through multiagent debate[J]. arXiv:2305.14325, 2023. [68] QIAN C, CONG X, YANG C, et al. Communicative agents for software development[J]. arXiv:2307.07924, 2023. [69] CHEN P L, CHANG C S. InterAct: exploring the potentials of ChatGPT as a cooperative agent[J]. arXiv:2308.01552, 2023. [70] PU D, DEMBERG V. ChatGPT vs human-authored text: insi-ghts into controllable text summarization and sentence style transfer[J]. arXiv:2306.07799, 2023. [71] GEHMAN S, GURURANGAN S, SAP M, et al. RealToxicityPrompts: evaluating neural toxic degeneration in language models[J]. arXiv:2009.11462, 2020. [72] BOMMASANI R, HUDSON D A, ADELI E, et al. On the opportunities and risks of foundation models[J]. arXiv:2108. 07258, 2021. [73] LEVY S, SAXON M, WANG W Y. Investigating memorization of conspiracy theories in text generation[J]. arXiv:2101.00379, 2021. [74] YEN H, GAO T, HOU M, et al. HELMET: how to evaluate long-context language models effectively and thoroughly[J]. arXiv:2410.02694, 2024. [75] PENG K P, NISBETT R E, WONG N Y C. Validity problems comparing values across cultures and possible solutions[J]. Psychological Methods, 1997, 2(4): 329-344. [76] YANG Z, QI P, ZHANG S, et al. HotpotQA: a dataset for diverse, explainable multi-hop question answering[J]. arXiv:1809.09600, 2018. [77] THORNE J, VLACHOS A, CHRISTODOULOPOULOS C, et al. FEVER: a large-scale dataset for fact extraction and VERification[J]. arXiv:1803.05355, 2018. [78] RAJPURKAR P. SQuAD: 100,000+ questions for machine comprehension of text[J]. arXiv:1606.05250, 2016. [79] LIN S, HILTON J, EVANS O. TruthfulQA: measuring how models mimic human falsehoods[J]. arXiv:2109.07958, 2021. [80] KAMOI R, GOYAL T, RODRIGUEZ J D, et al. WICE: real-world entailment for claims in Wikipedia[J]. arXiv:2303. 01432, 2023. [81] FABBRI A R, KRY?CI?SKI W, MCCANN B, et al. Summ-Eval: re-evaluating summarization evaluation[J]. Transactions of the Association for Computational Linguistics, 2021, 9: 391-409. [82] LABAN P, SCHNABEL T, BENNETT P N, et al. SummaC: re-visiting NLI-based models for inconsistency detection in summarization[J]. Transactions of the Association for Computational Linguistics, 2022, 10: 163-177. [83] WANG A, CHO K, LEWIS M. Asking and answering questions to evaluate the factual consistency of summaries[J]. arXiv:2004.04228, 2020. [84] LIU T, ZHANG Y, BROCKETT C, et al. A token-level reference-free hallucination detection benchmark for free-form text generation[J]. arXiv:2104.08704, 2021. [85] MATT S. Clinical PII redaction test dataset by MattStammers[EB/OL]. (2023-12-12)[2024-10-25]. https://huggingface.co/datasets/MattStammers/Clinical_PII_Redaction_Test. [86] SRINATH M, WILSON S, GILES C L. Privacy at scale: introducing the PrivaSeer corpus of web privacy policies[J]. arXiv:2004.11131, 2020. [87] ZHAO K, YU L, ZHOU S, et al. A fine-grained Chinese software privacy policy dataset for sequence labeling and regulation compliant identification[J]. arXiv:2212.04357, 2022. [88] HUANG S, MACLEAN W, KANG X, et al. NAP2: a benchmark for naturalness and privacy-preserving text rewriting by learning from human[J]. arXiv:2406.03749, 2024. [89] TIAN B, LIANG X, CHENG S, et al. To forget or not? towards practical knowledge unlearning for large language models[J]. arXiv:2407.01920, 2024. [90] ZAMPIERI M, MALMASI S, NAKOV P, et al. Predicting the type and target of offensive posts in social media[J]. arXiv:1902.09666, 2019. [91] ROSENTHAL S, ATANASOVA P, KARADZHOV G, et al. SOLID: a large-scale semi-supervised dataset for offensive language identification[J]. arXiv:2004.14454, 2020. [92] LLM-TUNING-SAFETY. HEx-PHI dataset[EB/OL]. (2023-02-05)[2024-10-15]. https://huggingface. co/datasets/LLM-Tuning-Safety/HEx-PHI. [93] BAI Y, JONES A, NDOUSSE K, et al. Training a helpful and harmless assistant with reinforcement learning from human feedback[J]. arXiv:2204.05862, 2022. [94] HARTVIGSEN T, GABRIEL S, PALANGI H, et al. ToxiGen: a large-scale machine-generated dataset for adversarial and implicit hate speech detection[J]. arXiv:2203.09509, 2022. [95] WU S, SCH?PKE-GONZALEZ A, KUMAR S, et al. HOT speech: comments from political news posts and videos that were annotated for hateful, offensive, and toxic content[D]. Ann Arbor: University of Michigan, 2023. [96] ELSHERIEF M, ZIEMS C, MUCHLINSKI D, et al. Latent hatred: a benchmark for understanding implicit hate speech[J]. arXiv:2109.05322, 2021. [97] EMELIN D, BRAS R L, HWANG J D, et al. Moral Stories: situated reasoning about norms, intents, actions, and their consequences[J]. arXiv:2012.15738, 2020. [98] KIM H, YU Y, JIANG L, et al. PROSOCIALDIALOG: a prosocial backbone for conversational agents[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2022: 4005-4029. [99] LOURIE N, LE BRAS R, CHOI Y. SCRUPLES: a corpus of community ethical judgments on 32, 000 real-life anecdotes[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2021: 13470-13479. [100] CHIU Y Y, JIANG L, CHOI Y. DailyDilemmas: revealing value preferences of LLMs with quandaries of daily life[J]. aarXiv:2410.02683, 2024. [101] SAKAGUCHI K, LE BRAS R, BHAGAVATULA C, et al. Winogrande: an adversarial Winograd schema challenge at scale[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2020: 8732-8740. [102] ZHAO J, WANG T, YATSKAR M, et al. Gender bias in coreference resolution: evaluation and debiasing methods[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2018: 15-20. [103] BEAMER S, ASANOVI? K, PATTERSON D. The GAP benchmark suite[J]. arXiv:1508.03619, 2015. [104] GAUT A, SUN T, TANG S, et al. Towards understanding gender bias in relation extraction[J]. arXiv:1911.03642, 2019. [105] ZHANG G, LI Y, WU Y, et al. CORGI-PM: a Chinese corpus for gender bias probing and mitigation[J]. arXiv:2301. 00395, 2023. [106] NADEEM M, BETHKE A, REDDY S. StereoSet: measu-ring stereotypical bias in pretrained language models[J]. arXiv:2004.09456, 2020. [107] NANGIA N, VANIA C, BHALERAO R, et al. CrowS-Pairs: a challenge dataset for measuring social biases in masked language models[J]. arXiv:2010.00133, 2020. [108] DUCHENE C, JAMET H, GUILLAUME P, et al. A benchmark for toxic comment classification on civil comments dataset[J]. arXiv:2301.11125, 2023. [109] ZHAO J S, ZHU S C, LIU Y, et al. CDail-Bias MEASU-RER: a model ensemble approach for dialogue social bias measurement[C]//Proceedings of the Natural Language Processing and Chinese Computing. Cham: Springer Nature Switzerland, 2022: 204-215. [110] DHAMALA J, SUN T, KUMAR V, et al. BOLD: dataset and metrics for measuring biases in open-ended language generation[C]//Proceedings of the ACM Conference on Fairness, Accountability, and Transparency. New York: ACM, 2021: 862-872. [111] PARRISH A, CHEN A, NANGIA N, et al. BBQ: a hand-built bias benchmark for question answering[J]. arXiv:2110. 08193, 2021. [112] KO?ISKY T, SCHWARZ J, BLUNSOM P, et al. The NarrativeQA reading comprehension challenge[J]. Transactions of the Association for Computational Linguistics, 2018, 6: 317-328. [113] WANG M Z, CHEN L Z, CHENG F, et al. Leave no document behind: benchmarking long-context LLMs with exten-ded multi-doc QA[C]//Proceedings of the Conference on Emp-irical Methods in Natural Language Processing. Stroudsburg: ACL, 2024: 5627-5646. [114] WANG Y, MISHRA S, ALIPOORMOLABASHI P, et al. Super-Naturalinstructions: generalization via declarative ins-tructions on 1600+NLP tasks[J]. arXiv:2204.07705, 2022. [115] ZENG Z, YU J, GAO T, et al. Evaluating large language models at evaluating instruction following[J]. arXiv:2310. 07641, 2023. [116] HE Y, JIN D, WANG C, et al. Multi-IF: benchmarking LLMs on multi-turn and multilingual instructions following[J]. arXiv:2410.15553, 2024. [117] SRIVASTAVA A, RASTOGI A, RAO A, et al. Beyond the imitation game: quantifying and extrapolating the capabil-ities of language models[J]. arXiv:2206.04615, 2022. [118] LIANG X, SONG S, NIU S, et al. UHGEval: benchmar-king the hallucination of Chinese large language models via unconstrained generation[J]. arXiv:2311.15296, 2023. [119] DZIRI N, KAMALLOO E, MILTON S, et al. FaithDial: a faithful benchmark for information-seeking dialogue[J]. Transactions of the Association for Computational Linguistics, 2022, 10: 1473-1490. [120] LUO W, SHEN T, LI W, et al. HalluDial: a large-scale benchmark for automatic dialogue-level hallucination evaluation[J]. arXiv:2406.07070, 2024. [121] KIELA D, BARTOLO M, NIE Y, et al. Dynabench: rethin-king benchmarking in NLP[J]. arXiv:2104.14337, 2021. [122] LI H, GUO D, LI D, et al. P-Bench: a multi-level privacy evaluation benchmark for language models[J]. arXiv:2311. 04044, 2023. [123] HUANG Y, ZHANG Q, SUN L. TrustGPT: a benchmark for trustworthy and responsible large language models[J]. arXiv:2306.11507, 2023. [124] R?TTGER P, PERNISI F, VIDGEN B, et al. SafetyPrompts: a systematic review of open datasets for evaluating and improving large language model safety[J]. arXiv:2404. 05399, 2024. [125] HENDRYCKS D, BURNS C, BASART S, et al. Aligning AI with shared human values[J]. arXiv:2008.02275, 2020. [126] BAI Y, LV X, ZHANG J, et al. LongBench: a bilingual, multitask benchmark for long context understanding[J]. arXiv:2308.14508, 2023. [127] BAI Y, TU S, ZHANG J, et al. LongBench v2: towards deeper understanding and reasoning on realistic long-context multitasks[J]. arXiv:2412.15204, 2024. [128] LI M, ZHANG S, LIU Y, et al. NeedleBench: can LLMs do retrieval and reasoning in 1 million context window?[J]. arXiv:2407.11963, 2024. [129] CHEN Y, GAO H, CUI G, et al. Why should adversarial perturbations be imperceptible? rethink the research paradigm in adversarial NLP[J]. arXiv:2210.10683, 2022. [130] WEN B, KE P, GU X, et al. Benchmarking complex instru-ction-following with multiple constraints composition[J]. arXiv:2407.03978, 2024. [131] ZHOU J, LU T, MISHRA S, et al. Instruction-following evaluation for large language models[J]. arXiv:2311.07911, 2023. [132] BAI G, LIU J, BU X, et al. MT-Bench-101: a fine-grained benchmark for evaluating large language models in multi-turn dialogues[J]. arXiv:2402.14762, 2024. [133] SINITSIN A, PLOKHOTNYUK V, PYRKIN D, et al. Editable neural networks[J]. arXiv:2004.00345, 2020. [134] LEE N, PING W, XU P, et al. Factuality enhanced language models for open-ended text generation[C]//Advances in Neural Information Processing Systems, 2022: 34586-34599. [135] ZHU C, RAWAT A S, ZAHEER M, et al. Modifying memories in transformer models[J]. arXiv:2012.00363, 2020. [136] MITCHELL E, LIN C, BOSSELUT A, et al. Fast model editing at scale[J]. arXiv:2110.11309, 2021. [137] PINTER Y, ELHADAD M. Emptying the ocean with a spoon: should we edit models?[J]. arXiv:2310.11958, 2023. [138] MENG K, BAU D, ANDONIAN A, et al. Locating and edi-ting factual associations in GPT[C]//Advances in Neural Information Processing Systems, 2022: 17359-17372. [139] MENG K, SHARMA A S, ANDONIAN A, et al. Mass-editing memory in a transformer[J]. arXiv:2210.07229, 2022. [140] CHENG S Y, ZHANG N Y, TIAN B Z, et al. Editing language model-based knowledge graph embeddings[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2024: 17835-17843. [141] LI J T, LIU Y Q, FAN W Q, et al. Empowering molecule discovery for molecule-caption translation with large language models: a ChatGPT perspective[J]. IEEE Transactions on Knowledge and Data Engineering, 2024, 36(11): 6071-6083. [142] SHUSTER K, POFF S, CHEN M, et al. Retrieval augmentation reduces hallucination in conversation[J]. arXiv:2104.07567, 2021. [143] WU Y, RABE M N, HUTCHINS D L, et al. Memorizing transformers[J]. arXiv:2203.08913, 2022. [144] TONMOY S M, ZAMAN S M, JAIN V, et al. A comprehensive survey of hallucination mitigation techniques in large language models[J]. arXiv:2401.01313, 2024. [145] KE L, LI X J, BISK Y, et al. Tactical rewind: self-correction via backtracking in vision-and-language navigation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 6734-6742. [146] MEHRABI N, GOYAL P, DUPUY, et al. Flirt: feedback loop in-context red teaming[J]. arXiv:2308.04265, 2023. [147] BAEVSKI A, ZHOU Y, MOHAMED A, et al. Wav2vec 2.0: a framework for self-supervised learning of speech representations[C]//Advances in Neural Information Processing Systems, 2020: 12449-12460. [148] CHANG H W, ZHANG H, JIANG L, et al. MaskGiT: masked generative image transformer[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 11305-11315. [149] LU J, CLARK C, ZELLERS R, et al. Unified-IO: a unified model for vision, language, and multi-modal tasks[J]. arXiv:2206.08916, 2022. [150] WANG H Y, TANG H, JIANG L, et al. GiT: towards generalist vision transformer through universal language interface[C]//Proceedings of the European Conference on Comp-uter Vision. Cham: Springer Nature Switzerland, 2025: 55-73. [151] MANAKUL P, LIUSIE A, GALES M J F. SelfCheckGPT: zero-resource black-box hallucination detection for generative large language models[J]. arXiv:2303.08896, 2023. [152] YAO S, ZHAO J, YU D, et al. ReAct: synergizing reaso-ning and acting in language models[C]//Proceeding of the International Conference on Learning Representations, 2023. [153] MIALON G, DESSI R, LOMELI M, et al. Augmented language models: a survey[J]. arXiv:2302.07842, 2023. |
| [1] | JU Zedong, CHENG Chunlei, YE Qing, PENG Lin, GONG Zhufan. Review of Research Progress in Chinese Grammar Error Correction Technology [J]. Computer Engineering and Applications, 2025, 61(20): 36-53. |
| Viewed | ||||||
|
Full text |
|
|||||
|
Abstract |
|
|||||