Computer Engineering and Applications ›› 2024, Vol. 60 ›› Issue (17): 17-33.DOI: 10.3778/j.issn.1002-8331.2312-0035
• Research Hotspots and Reviews • Previous Articles Next Articles
ZHANG Qintong, WANG Yuchao, WANG Hexi, WANG Junxin, CHEN Hai
ZHANG Qintong, WANG Yuchao, WANG Hexi, WANG Junxin, CHEN Hai. Comprehensive Review of Large Language Model Fine-Tuning[J]. Computer Engineering and Applications, 2024, 60(17): 17-33.
张钦彤, 王昱超, 王鹤羲, 王俊鑫, 陈海. 大语言模型微调技术的研究综述[J]. 计算机工程与应用, 2024, 60(17): 17-33.
Add to citation manager EndNote|Ris|BibTeX
[1] YOSINSKI J, CLUNE J, BENGIO Y, et al. How transferable are features in deep neural networks?[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems, 2014: 3320-3328. [2] HINTON G, VINYALS O, DEAN J. Distilling the knowledge in a neural network[J]. arXiv:1503.02531, 2015. [3] RADFORD A, NARASIMHAN K, SALIMANS T, et al. Improving language understanding by generative pre-training[EB/OL]. [2023-11-23]. 2018. [4] RADFORD A. Language models are unsupervised multitask learners[EB/OL]. [2023-11-23]. [5] KAPLAN J, MCCANDLISH S, HENIGHAN T, et al. Scaling laws for neural language models[J]. arXiv:2001.08361, 2020. [6] BROWN T B, MANN B, RYDER N, et al. Language models are few-shot learners[J]. arXiv:2005.14165, 2020. [7] OPENAI. GPT-4 technical report[J]. arXiv:2303.08774, 2023. [8] SMITH L N. Cyclical learning rates for training neural networks[C]//Proceedings of the 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), 2017: 464-472. [9] CHUNG H W, HOU L, LONGPRE S, et al. Scaling instruction-finetuned language models[J]. arXiv:2210.11416, 2022. [10] ZHANG S, DONG L, LI X, et al. Instruction tuning for large language models: a survey[J]. arXiv:2308.10792, 2023. [11] HAN X, ZHANG Z, DING N, et al. Pre-trained models: past, present and future[J]. AI Open, 2021, 2: 225-250. [12] QIU X, SUN T, XU Y, et al. Pre-trained models for natural language processing: a survey[J]. Science China Technolo- gical Sciences, 2020, 63(10): 1872-1897. [13] LIU P, YUAN W, FU J, et al. Pre-train, prompt, and predict: a systematic survey of prompting methods in natural language processing[J]. arXiv:2107.13586, 2021. [14] DING N, QIN Y, YANG G, et al. Parameter-efficient fine-tuning of large-scale pre-trained language models[J]. Nature Machine Intelligence, 2023, 5(3): 220-235. [15] MANNING C, SCHUTZE H. Foundations of statistical natural language processing[M]. Cambridge, Massachusetts: MIT Press, 1999. [16] ROSENFELD R. Two decades of statistical language modeling: where do we go from here?[J]. Proceedings of the IEEE, 2000, 88(8): 1270-1278. [17] GAO J, LIN C Y. Introduction to the special issue on statistical language modeling[J]. ACM Transactions on Asian Language Information Processing (TALIP), 2004, 3(2): 87-93. [18] GOODMAN J T. A bit of progress in language modeling[J]. Computer Speech & Language, 2001, 15(4): 403-434. [19] BENGIO Y, DUCHARME R, VINCENT P, et al. A neural probabilistic language model[C]//Advances in Neural Information Processing Systems, 2000: 932-938. [20] BAHDANAU D, CHO K, BENGIO Y. Neural machine translation by jointly learning to align and translate[J]. arXiv:1409.0473, 2014. [21] MIKOLOV T, CHEN K, CORRADO G, et al. Efficient estimation of word representations in vector space[J]. arXiv:1301.3781, 2013. [22] MIKOLOV T, KARAFIáT M, BURGET L, et al. Recurrent neural network based language modeling in meeting recognition[C]//Proceedings of the Annual Conference of the International Speech Communication Association, 2011: 2877-2880. [23] SUTSKEVER I, VINYALS O, LE Q V. Sequence to sequence learning with neural networks[C]//Advances in Neural Information Processing Systems, 2014: 3104-3112. [24] DAI A M, LE Q V. Semi-supervised sequence learning[C]//Advances in Neural Information Processing Systems, 2015: 3079-3087. [25] PETERS M, NEUMANN M, IYYER M, et al. Deep contextualized word representations[J]. arXiv:1802.05365, 2018. [26] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Advances in Neural Information Processing Systems, 2017: 5998-6008. [27] DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[J]. arXiv:1810.04805, 2018. [28] DING N, QIN Y, YANG G, et al. Delta tuning: a comprehensive study of parameter efficient methods for pre-trained language models[J]. arXiv:2203.06904, 2022. [29] HUANG J, LI C, SUBUDHI K, et al. Few-shot named entity recognition: a comprehensive study[J]. arXiv:2012.14978, 2020. [30] XIE Q, DAI Z, HOVY E, et al. Unsupervised data augmentation for consistency training[J]. arXiv:1904.12848, 2019. [31] MCCANN B, BRADBURY J, XIONG C, et al. Learned in translation: contextualized word vectors[C]//Advances in Neural Information Processing Systems, 2017: 6294-6305. [32] WANG Z, QU Y, CHEN L, et al. Label-aware double transfer learning for cross-specialty medical named entity recognition[J]. arXiv:1802.05365, 2018. [33] LIU Y, OTT M, GOYAL N, et al. RoBERTa: a robustly optimized bert pretraining approach[J]. arXiv:1907.11692, 2019. [34] TOUVRON H, LAVRIL T, IZACARD G, et al. LLaMA: open and efficient foundation language models[J]. arXiv:2302.13971, 2023. [35] TAORI R, GULRAJANI I, ZHANG T, et al. Alpaca: a strong, replicable instruction-following model[J]. Stanford Center for Research on Foundation Models, 2023, 3(6): 7. [36] DU Z, QIAN Y, LIU X, et al. GLM: general language model pretraining with autoregressive blank infilling[J]. arXiv:2103.10360, 2021. [37] SCAO T L, FAN A, AKIKISCAO C, et al. BLOOM: a 176b-parameter open-access multilingual language model[J]. arXiv:2211.05100, 2022. [38] SUN X, JI Y, MA B, et al. A comparative study between full-parameter and LoRA-based fine-tuning on chinese instruction data for instruction following large language model[J]. arXiv:2304.08109, 2023 [39] SEBASTIAN R. Recent advances in language model fine-tuning[EB/OL]. [2023-11-23]. [40] GUNEL B, DU J, CONNEAU A, et al. Supervised contrastive learning for pre-trained language model fine-tuning[J]. arXiv:2011.01403, 2020. [41] HOWARD J, RUDER S. Universal language model fine-tuning for text classification[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL 2018), 2018: 328-339 [42] VíCTOR C, SPRECHMANN P, HANSEN S, et al. Beyond fine-tuning: transferring behavior in reinforcement learning[J]. arXiv:2102.13515, 2021. [43] MALLADI S, GAO T, NICHANI E, et al. Fine-tuning language models with just forward passes[J]. arXiv:2305.17333, 2023. [44] LV K, YANG Y, LIU T, et al. Full parameter fine-tuning for large language models with limited resources[J]. arXiv:2306.09782, 2023. [45] PHOO C P, HARIHARAN B. Self-training for few-shot transfer across extreme task differences[J]. arXiv:2010.07734, 2020. [46] LI S, CHEN D, CHEN Y, et al. Unsupervised Finetuning[J]. arXiv:2110.09510, 2021. [47] XU Y, QIU X, ZHOU L, et al. Improving BERT fine-tuning via self-ensemble and self-distillation[J]. arXiv:2002.10345, 2020. [48] ZHU C, CHENG Y, GAN Z, et al. FreeLB: enhanced adversarial training for natural language understanding[J]. arXiv:1909.11764, 2019. [49] JIANG H, HE P, CHEN W, et al. Smart: robust and efficient fine-tuning for pre-trained natural language models through principled regularized optimization[J]. arXiv:1911.03437, 2019. [50] YU Y, ZUO S, JIANG H, et al. Fine-tuning pre-trained language model with weak supervision: a contrastive-regularized self-training approach[J]. arXiv:2010.07835, 2020. [51] TANWISUTH K, ZHANG S, ZHENG H, et al. POUF: prompt-oriented unsupervised fine-tuning for large pre-trained models[J]. arXiv:2305.00350, 2023. [52] AGHAJANYAN A, ZETTLEMOYER L, GUPTA S. Intrinsic dimensionality explains the effectiveness of language model fine-tuning[J]. arXiv:2012.13255, 2020. [53] HAN W, PANG B, WU Y. Robust transfer learning with pretrained language models through adapters[J]. arXiv:2108.02340, 2021. [54] LEE J, YOON W, KIM S, et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining[J]. Bioinformatics, 2020, 36(4): 1234-1240. [55] SEE A, LIU P J, Manning C D. Get to the point: summarization with pointer-generator networks[J]. arXiv:1704.04368, 2017. [56] LEWIS M, LIU Y, GOYAL N, et al. BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension[J]. arXiv:1910.13461, 2019. [57] BIDERMAN S, SCHOELKOPF H, ANTHONY Q, et al. Pythia: a suite for analyzing large language models across training and scaling[J]. arXiv:2304.01373, 2023. [58] LI X L, LIANG P. Prefix-tuning: optimizing continuous prompts for generation[J]. arXiv:2101.00190, 2021. [59] LIU H, TAM D, MUQEETH M, et al. Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning[C]//Advances in Neural Information Processing Systems, 2022: 1950-1965. [60] ZAKEN E B, RAVFOGEL S, GOLDBERG Y. BitFit: simple parameter-efficient fine-tuning for transformer-based masked language-models[J]. arXiv:2106.10199, 2021. [61] GUO D, RUSH A M, KIM Y. Parameter-efficient transfer learning with diff pruning[J]. arXiv:2012.07463, 2020. [62] HU E J, SHEN Y, WALLIS P, et al. LoRA: low-rank adaptation of large language models[J]. arXiv:2106.09685, 2021. [63] LI C, FARKHOOR H, LIU R, et al. Measuring the intrinsic dimension of objective landscapes[J]. arXiv:1804.08838, 2018. [64] BACH F R, JORDAN M I. Predictive low-rank decomposition for kernel methods[C]//Proceedings of the 22nd International Conference on Machine Learning, 2005: 33-40. [65] CHEN Y K, QIAN S J, TANG H T, et al. LongLoRA: Efficient fine-tuning of long-context large language models[J]. arXiv:2309.12307, 2023. [66] CHAVAN A, LIU Z, GUPTA D, et al. One-for-all: generalized LoRA for parameter-efficient fine-tuning[J]. arXiv:2306.07967, 2023. [67] ZHANG Q, CHEN M, BUKHARIN A, et al. Adaptive budget allocation for parameter-efficient fine-tuning[J]. arXiv:2303.10512, 2023. [68] LUO M, XU X, LIU Y, et al. In-context learning with retrieved demonstrations for language models: a survey[J]. arXiv:2401.11624, 2024. [69] RAZEGHI Y, LOGAN IV R L, GARDNER M, et al. Impact of pretraining term frequencies on few-shot reasoning[J]. arXiv:2202.07206, 2022. [70] XIE S M, RAGHUNATHAN A, LIANG P, et al. An explanation of in-context learning as implicit bayesian inference[J]. arXiv:2111.02080, 2021. [71] LIU J, SHEN D, ZHANG Y, et al. What makes good in-context examples for GPT-3?[J]. arXiv:2101.06804, 2021. [72] HOLTZMAN A, WEST P, SCHWARTZ V, et al. Surface form competition: why the highest probability answer isn’t always right[J]. arXiv:2104.08315, 2021. [73] ZHAO T Z, WALLACE E, FENG S, et al. Calibrate before use: improving few-shot performance of language models[J]. arXiv:2102.09690, 2021. [74] WEI J, WANG X, SCHUURMANS D, et al. Chain of thought prompting elicits reasoning in large language models[J]. arXiv:2201.11903, 2022. [75] QIAO S, OU Y, ZHANG N, et al. Reasoning with language model prompting: a survey[J]. arXiv:2212.09597, 2022. [76] CHEN W H, MA X G, WANG X Y, et al. Program of thoughts prompting: disentangling computation from reasoning for numerical reasoning tasks[J]. arXiv:2211.12588, 2022 [77] LONG J Y. Large language model guided tree-of-thought[J]. arXiv:2305.08291, 2023. [78] NING X F, LIN Z N, ZHOU Z X, et al. Skeleton-of-thought: Large language models can do parallel decoding[J]. arXiv:2307.15337, 2023. [79] BESTA M, BLACH N, KUBICEK A, et al. Graph of thoughts: solving elaborate problems with large language models[J]. arXiv:2308.09687, 2023. [80] LEI B, LIN P H, LIAO C, et al. Boosting logical reasoning in large language models through a new framework: the graph of thought [J]. arXiv:2308.08614, 2023. [81] 林令德, 刘纳, 王正安. Adapter与Prompt Tuning微调方法研究综述[J]. 计算机工程与应用, 2023, 59(2): 12-21. LIN L D, LIU N, WANG Z A. Review of research on Adapter and Prompt Tuning[J]. Computer Engineering and Applications, 2023, 59(2): 12-21. [82] SHIN T, RAZEGHI Y, LOGAN I R L, et al. Autoprompt: eliciting knowledge from language models with automatically generated prompts[J]. arXiv:2010.15980, 2020. [83] GAO T, FISCH A, CHEN D. Making pre-trained language models better few-shot learners[J]. arXiv:2012.15723, 2020. [84] LIU X, ZHENG Y, DU Z, et al. GPT understands, too[J]. arXiv:2103.10385, 2021. [85] LESTER B, AL-RFOU R, CONSTANT N. The power of scale for parameter-efficient prompt tuning[J]. arXiv:2104.08691, 2021. [86] QIN G, EISNER J. Learning how to ask: querying LMs with mixtures of soft prompts[J]. arXiv:2104.06599, 2021. [87] LONGPRE S, HOU L, VU T, et al. The flan collection: designing data and methods for effective instruction tuning[J]. arXiv:2301.13688, 2023. [88] SANH V, WEBSON A, RAFFEL C, et al. Multitask prompted training enables zero-shot task generalization[J]. arXiv:2110. 08207, 2021. [89] XUE F Z, JAIN K, SHAH M H, et al. Instruction in the wild: a user-based instruction dataset[EB/OL]. [2023-11-23]. [90] WANG Y Z, MISHRA S, ALIPOORMOLABASHI P, et al. Super-naturalinstructions: generalization via declarative instructions on 1600+ NLP tasks[J]. arXiv:2204.07705, 2022. [91] MUENNIGHOFF N, WANG T, SUTAWIKA L, et al. Crosslingual generalization through multitask finetuning[J]. arXiv:2211.01786, 2022. [92] DING N, CHEN Y, XU B, et al. Enhancing chat language models by scaling high-quality instructional conversations[J]. arXiv:2305.14233, 2023. [93] YAO S Y, YU D, ZHAO J, et al. Tree of thoughts: deliberate problem solving with large language models[J]. arXiv:2305. 10601, 2023. [94] XU Z Y, SHEN Y, HUANG L F. Multiinstruct: improving multi-modal zero shot learning via instruction tuning[J]. arXiv:2212.10773, 2022. [95] BARAL C, YANG Y Z, BLANC E, et al. Towards development of models that learn new tasks from instructions[D]. Phoenix City: Arizona State University, 2023. [96] MARTIN A, ASHIISH A, PAUL B, et al. Tensorflow: large-scale machine learning on heterogeneous distributed systems[J]. arXiv:1603.04467, 2016. [97] OUYANG L, WU J, JIANG X, et al. Training language models to follow instructions with human feedback[J]. arXiv:2203. 02155, 2022. [98] BAI Y, JONES A, NDOUSSE K, et al. Training a helpful and harmless assistant with reinforcement learning from human feedback[J]. arXiv:2204.05862, 2022. [99] SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms[J]. arXiv:1707.06347, 2017. [100] BAI Y T, KADAVATH S, KUNDU S, et al. Constitutional AI: harmlessness from AI feedback. 2022[J]. arXiv:2212. 08073, 2022. [101] LEE H, PHATALE S, MANSOOR H, et al. RLAIF: scaling reinforcement learning from human feedback with ai feedback[J]. arXiv:2309.00267, 2023. [102] WU Z X, LIU N F, POTTS C. Identifying the limits of cross-domain knowledge transfer for pretrained models[J]. arXiv:2104.08410, 2021. [103] QI X, ZENG Y, XIE T, et al. Fine-tuning aligned language models compromises safety, even when users do not intend to![J]. arXiv:2310.03693, 2023. [104] HE J, CHEN J, HE S, et al. AdaMix: mixture-of-adaptations for parameter-efficient model tuning[J]. arXiv:2205.09717, 2022. [105] ZHAO W X, ZHOU K, LI J, et al. A survey of large language models[J]. arXiv:2303.18223, 2023. [106] HOULSBY N, GIURGIU A, JASTRZEBSKI S, et al. Parameter-efficient transfer learning for NLP[J]. arXiv:1902.00751, 2019. [107] WANG A, SINGH A, HILL F, et al. GLUE: a multi-task benchmark and analysis platform for natural language understanding[J]. arXiv:1804.07461, 2018. [108] HE R, LIU L, YE H, et al. On the effectiveness of adapter-based tuning for pretrained language model adaptation[J]. arXiv:2106.03164, 2021. [109] YANG H, LI P, LAM W. Parameter-efficient tuning by manipulating hidden states of pretrained language models for classification tasks[J]. arXiv:2204.04596, 2022. [110] HE P, LIU X, GAO J, et al. DeBERTa: decoding-enhanced BERT with disentangled attention[J]. arXiv:2006.03654, 2020. [111] ZHAI X, PUIGCERVER J, KOLESNIKOV A, et al. A large-scale study of representation learning with the visual task adaptation benchmark[J]. arXiv:1910.04867, 2019. [112] BANSAL M, KUMAR M, SACHDEVA M, et al. Transfer learning for image classification using VGG19: Caltech-101 image data set[J]. Journal of Ambient Intelligence and Humanized Computing, 2023, 14(4): 3609-3620. [113] HELBER P, BISCHKE B, DENGEL A, et al. EuroSAT: a novel dataset and deep learning benchmark for land use and land cover classification[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2019, 12(7): 2217-2226. [114] JOHNSON J, HARIHARAN B, MAATEN L V D, et al. Clevr: a diagnostic dataset for compositional language and elementary visual reasoning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 2901-2910. [115] WILLIAMS A, NANGIA N, BOWMAN S R. A broad-coverage challenge corpus for sentence understanding through inference[J]. arXiv:1704.05426, 2017. [116] WOLF T, DEBUT L, SANH V, et al. Transformers: state-of-the-art natural language processing[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, 2020: 38-45. [117] HE J, ZHOU C, MA X, et al. Towards a unified view of parameter-efficient transfer learning[J]. arXiv:2110.04366, 2021. [118] CHRISTIANO P, LEIKE J, BROWN T B, et al. Deep reinforcement learning from human preferences[J]. arXiv:1706. 03741, 2017. [119] KINGMA D P, BA J. ADAM: a method for stochastic optimization[J]. arXiv:1412.6980, 2014. [120] ZIEGLER D M, STIENNON N, WU J, et al. Fine-tuning language models from human preferences[J]. arXiv:1909. 08593, 2019. [121] GANESAN K. Rouge 2.0: updated and improved measures for evaluation of summarization tasks[J]. arXiv:1803.01937, 2018. [122] TOUVRON H, MARTIN L, STONE K, et al. LLaMA 2: open foundation and fine-tuned chat models[J]. arXiv:2307. 09288, 2023. [123] CASPER S, DAVIES X, SHI C, et al. Open problems and fundamental limitations of reinforcement learning from human feedback[J]. arXiv:2307.15217, 2023. |
[1] | CHEN Zhaohong, HONG Zhiyong, YU Wenhua, ZHANG Xin. Extreme Multi-Label Text Classification Based on Balance Function [J]. Computer Engineering and Applications, 2024, 60(4): 163-172. |
[2] | CUI Jinman, LI Dongmei, TIAN Xuan, MENG Xianghao, YANG Yu, CUI Xiaohui. Survey on Prompt Learning [J]. Computer Engineering and Applications, 2024, 60(23): 1-27. |
[3] | YAO Yi, CHEN Zhaoyang, DU Xiaoming, YAO Tianlei, LI Qingshang, SUN Mingwei. Survey of Multimodal Knowledge Graph Construction Technology and Its Application in Military Field [J]. Computer Engineering and Applications, 2024, 60(22): 18-37. |
[4] | ZHANG Hengwei, XU Linsen, CHEN Gen, WANG Zhihuan, SUI Xiang. Upper Limb Action Recognition Based on Transfer Learning and sEMG [J]. Computer Engineering and Applications, 2024, 60(20): 124-132. |
[5] | SU Yilei, LI Weijun, LIU Xueyang, DING Jianping, LIU Shixia, LI Haonan, LI Guanfeng. Review of Text Classification Methods Based on Graph Neural Networks [J]. Computer Engineering and Applications, 2024, 60(19): 1-17. |
[6] | WANG Nan, TAN Shuru, XIE Xiaolan, LI Hairong. Pre-Training Model of Public Opinion Event Vector [J]. Computer Engineering and Applications, 2024, 60(18): 189-197. |
[7] | SU Youli, HU Xuanyu, MA Shijie, ZHANG Yuning, Abudukelimu Abulizi, Halidanmu Abudukelimu. Review of Research on Artificial Intelligence in Traditional Chinese Medicine Diagnosis and Treatment [J]. Computer Engineering and Applications, 2024, 60(16): 1-18. |
[8] | GAO Shuai, XI Xuefeng, ZHENG Qian, CUI Zhiming, SHENG Shengli. Review of Research on Natural Language Interfaces for Data Visualization [J]. Computer Engineering and Applications, 2024, 60(15): 24-41. |
[9] | LI Yi, GENG Chaoyang, YANG Dan. Fin-BERT-Based Event Extraction Method for Chinese Financial Domain [J]. Computer Engineering and Applications, 2024, 60(14): 123-132. |
[10] | YU Fengrui. Survey on Automated Recognition and Extraction of TTPs [J]. Computer Engineering and Applications, 2024, 60(13): 1-22. |
[11] | ZHAO Jigui, QIAN Yurong, WANG Kui, HOU Shuxiang, CHEN Jiaying. Survey of Chinese Named Entity Recognition Research [J]. Computer Engineering and Applications, 2024, 60(1): 15-27. |
[12] | XIAO Lizhong, ZANG Zhongxing, SONG Saisai. Research on Cascaded Labeling Framework for Relation Extraction with Self-Attention [J]. Computer Engineering and Applications, 2023, 59(3): 77-83. |
[13] | LIAO Chunlin, ZHANG Hongjun, LIAO Xianglin, CHENG Kai, LI Dashuo, WANG Hang. Survey of Open Source Natural Language Processing Tools [J]. Computer Engineering and Applications, 2023, 59(22): 36-56. |
[14] | LIU Andong, PENG Lin, YE Qing, DU Jianqiang, CHENG Chunlei, ZHA Qinglin. Advances in Named Entity Recognition in Electronic Medical Record [J]. Computer Engineering and Applications, 2023, 59(21): 39-51. |
[15] | WANG Wentao, XI Xuefeng, CUI Zhiming, XU Chuan. Research and Prospect of Toponym Entity Recognition [J]. Computer Engineering and Applications, 2023, 59(21): 66-82. |
Viewed | ||||||
Full text |
Abstract |