计算机工程与应用 ›› 2024, Vol. 60 ›› Issue (17): 17-33.DOI: 10.3778/j.issn.1002-8331.2312-0035
张钦彤,王昱超,王鹤羲,王俊鑫,陈海
出版日期:
2024-09-01
发布日期:
2024-08-30
ZHANG Qintong, WANG Yuchao, WANG Hexi, WANG Junxin, CHEN Hai
Online:
2024-09-01
Published:
2024-08-30
摘要: 大型语言模型的崛起是深度学习领域的全新里程碑,而微调技术在优化模型性能方面的起到了关键作用。对大型语言模型微调技术进行了全面的综述,回顾了语言模型的统计语言模型、神经网络语言模型、预训练语言模型和大语言模型四个阶段的发展历程和微调技术的基本概念,从经典参数微调、高效参数微调、提示微调和强化学习微调方法四大部分,探讨总结了各微调技术的原理与发展,并进行了一定的对比分析。最后,总结了当前微调技术的研究状况与发展重点,强调了该领域的潜在研究价值,并展望了未来的发展方向。
张钦彤, 王昱超, 王鹤羲, 王俊鑫, 陈海. 大语言模型微调技术的研究综述[J]. 计算机工程与应用, 2024, 60(17): 17-33.
ZHANG Qintong, WANG Yuchao, WANG Hexi, WANG Junxin, CHEN Hai. Comprehensive Review of Large Language Model Fine-Tuning[J]. Computer Engineering and Applications, 2024, 60(17): 17-33.
[1] YOSINSKI J, CLUNE J, BENGIO Y, et al. How transferable are features in deep neural networks?[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems, 2014: 3320-3328. [2] HINTON G, VINYALS O, DEAN J. Distilling the knowledge in a neural network[J]. arXiv:1503.02531, 2015. [3] RADFORD A, NARASIMHAN K, SALIMANS T, et al. Improving language understanding by generative pre-training[EB/OL]. [2023-11-23]. https://www.mikecaptain.com/resources/pdf/GPT-1.pdf 2018. [4] RADFORD A. Language models are unsupervised multitask learners[EB/OL]. [2023-11-23]. http://web.archive.org/web/20190226183542/https:/d4mucfpksywv.cloudfront.net/better-language-models/language-models.pdf. [5] KAPLAN J, MCCANDLISH S, HENIGHAN T, et al. Scaling laws for neural language models[J]. arXiv:2001.08361, 2020. [6] BROWN T B, MANN B, RYDER N, et al. Language models are few-shot learners[J]. arXiv:2005.14165, 2020. [7] OPENAI. GPT-4 technical report[J]. arXiv:2303.08774, 2023. [8] SMITH L N. Cyclical learning rates for training neural networks[C]//Proceedings of the 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), 2017: 464-472. [9] CHUNG H W, HOU L, LONGPRE S, et al. Scaling instruction-finetuned language models[J]. arXiv:2210.11416, 2022. [10] ZHANG S, DONG L, LI X, et al. Instruction tuning for large language models: a survey[J]. arXiv:2308.10792, 2023. [11] HAN X, ZHANG Z, DING N, et al. Pre-trained models: past, present and future[J]. AI Open, 2021, 2: 225-250. [12] QIU X, SUN T, XU Y, et al. Pre-trained models for natural language processing: a survey[J]. Science China Technolo- gical Sciences, 2020, 63(10): 1872-1897. [13] LIU P, YUAN W, FU J, et al. Pre-train, prompt, and predict: a systematic survey of prompting methods in natural language processing[J]. arXiv:2107.13586, 2021. [14] DING N, QIN Y, YANG G, et al. Parameter-efficient fine-tuning of large-scale pre-trained language models[J]. Nature Machine Intelligence, 2023, 5(3): 220-235. [15] MANNING C, SCHUTZE H. Foundations of statistical natural language processing[M]. Cambridge, Massachusetts: MIT Press, 1999. [16] ROSENFELD R. Two decades of statistical language modeling: where do we go from here?[J]. Proceedings of the IEEE, 2000, 88(8): 1270-1278. [17] GAO J, LIN C Y. Introduction to the special issue on statistical language modeling[J]. ACM Transactions on Asian Language Information Processing (TALIP), 2004, 3(2): 87-93. [18] GOODMAN J T. A bit of progress in language modeling[J]. Computer Speech & Language, 2001, 15(4): 403-434. [19] BENGIO Y, DUCHARME R, VINCENT P, et al. A neural probabilistic language model[C]//Advances in Neural Information Processing Systems, 2000: 932-938. [20] BAHDANAU D, CHO K, BENGIO Y. Neural machine translation by jointly learning to align and translate[J]. arXiv:1409.0473, 2014. [21] MIKOLOV T, CHEN K, CORRADO G, et al. Efficient estimation of word representations in vector space[J]. arXiv:1301.3781, 2013. [22] MIKOLOV T, KARAFIáT M, BURGET L, et al. Recurrent neural network based language modeling in meeting recognition[C]//Proceedings of the Annual Conference of the International Speech Communication Association, 2011: 2877-2880. [23] SUTSKEVER I, VINYALS O, LE Q V. Sequence to sequence learning with neural networks[C]//Advances in Neural Information Processing Systems, 2014: 3104-3112. [24] DAI A M, LE Q V. Semi-supervised sequence learning[C]//Advances in Neural Information Processing Systems, 2015: 3079-3087. [25] PETERS M, NEUMANN M, IYYER M, et al. Deep contextualized word representations[J]. arXiv:1802.05365, 2018. [26] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Advances in Neural Information Processing Systems, 2017: 5998-6008. [27] DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[J]. arXiv:1810.04805, 2018. [28] DING N, QIN Y, YANG G, et al. Delta tuning: a comprehensive study of parameter efficient methods for pre-trained language models[J]. arXiv:2203.06904, 2022. [29] HUANG J, LI C, SUBUDHI K, et al. Few-shot named entity recognition: a comprehensive study[J]. arXiv:2012.14978, 2020. [30] XIE Q, DAI Z, HOVY E, et al. Unsupervised data augmentation for consistency training[J]. arXiv:1904.12848, 2019. [31] MCCANN B, BRADBURY J, XIONG C, et al. Learned in translation: contextualized word vectors[C]//Advances in Neural Information Processing Systems, 2017: 6294-6305. [32] WANG Z, QU Y, CHEN L, et al. Label-aware double transfer learning for cross-specialty medical named entity recognition[J]. arXiv:1802.05365, 2018. [33] LIU Y, OTT M, GOYAL N, et al. RoBERTa: a robustly optimized bert pretraining approach[J]. arXiv:1907.11692, 2019. [34] TOUVRON H, LAVRIL T, IZACARD G, et al. LLaMA: open and efficient foundation language models[J]. arXiv:2302.13971, 2023. [35] TAORI R, GULRAJANI I, ZHANG T, et al. Alpaca: a strong, replicable instruction-following model[J]. Stanford Center for Research on Foundation Models, 2023, 3(6): 7. [36] DU Z, QIAN Y, LIU X, et al. GLM: general language model pretraining with autoregressive blank infilling[J]. arXiv:2103.10360, 2021. [37] SCAO T L, FAN A, AKIKISCAO C, et al. BLOOM: a 176b-parameter open-access multilingual language model[J]. arXiv:2211.05100, 2022. [38] SUN X, JI Y, MA B, et al. A comparative study between full-parameter and LoRA-based fine-tuning on chinese instruction data for instruction following large language model[J]. arXiv:2304.08109, 2023 [39] SEBASTIAN R. Recent advances in language model fine-tuning[EB/OL]. [2023-11-23]. https://www.ruder.io/recent-advances-lm-fine-tuning/. [40] GUNEL B, DU J, CONNEAU A, et al. Supervised contrastive learning for pre-trained language model fine-tuning[J]. arXiv:2011.01403, 2020. [41] HOWARD J, RUDER S. Universal language model fine-tuning for text classification[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL 2018), 2018: 328-339 [42] VíCTOR C, SPRECHMANN P, HANSEN S, et al. Beyond fine-tuning: transferring behavior in reinforcement learning[J]. arXiv:2102.13515, 2021. [43] MALLADI S, GAO T, NICHANI E, et al. Fine-tuning language models with just forward passes[J]. arXiv:2305.17333, 2023. [44] LV K, YANG Y, LIU T, et al. Full parameter fine-tuning for large language models with limited resources[J]. arXiv:2306.09782, 2023. [45] PHOO C P, HARIHARAN B. Self-training for few-shot transfer across extreme task differences[J]. arXiv:2010.07734, 2020. [46] LI S, CHEN D, CHEN Y, et al. Unsupervised Finetuning[J]. arXiv:2110.09510, 2021. [47] XU Y, QIU X, ZHOU L, et al. Improving BERT fine-tuning via self-ensemble and self-distillation[J]. arXiv:2002.10345, 2020. [48] ZHU C, CHENG Y, GAN Z, et al. FreeLB: enhanced adversarial training for natural language understanding[J]. arXiv:1909.11764, 2019. [49] JIANG H, HE P, CHEN W, et al. Smart: robust and efficient fine-tuning for pre-trained natural language models through principled regularized optimization[J]. arXiv:1911.03437, 2019. [50] YU Y, ZUO S, JIANG H, et al. Fine-tuning pre-trained language model with weak supervision: a contrastive-regularized self-training approach[J]. arXiv:2010.07835, 2020. [51] TANWISUTH K, ZHANG S, ZHENG H, et al. POUF: prompt-oriented unsupervised fine-tuning for large pre-trained models[J]. arXiv:2305.00350, 2023. [52] AGHAJANYAN A, ZETTLEMOYER L, GUPTA S. Intrinsic dimensionality explains the effectiveness of language model fine-tuning[J]. arXiv:2012.13255, 2020. [53] HAN W, PANG B, WU Y. Robust transfer learning with pretrained language models through adapters[J]. arXiv:2108.02340, 2021. [54] LEE J, YOON W, KIM S, et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining[J]. Bioinformatics, 2020, 36(4): 1234-1240. [55] SEE A, LIU P J, Manning C D. Get to the point: summarization with pointer-generator networks[J]. arXiv:1704.04368, 2017. [56] LEWIS M, LIU Y, GOYAL N, et al. BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension[J]. arXiv:1910.13461, 2019. [57] BIDERMAN S, SCHOELKOPF H, ANTHONY Q, et al. Pythia: a suite for analyzing large language models across training and scaling[J]. arXiv:2304.01373, 2023. [58] LI X L, LIANG P. Prefix-tuning: optimizing continuous prompts for generation[J]. arXiv:2101.00190, 2021. [59] LIU H, TAM D, MUQEETH M, et al. Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning[C]//Advances in Neural Information Processing Systems, 2022: 1950-1965. [60] ZAKEN E B, RAVFOGEL S, GOLDBERG Y. BitFit: simple parameter-efficient fine-tuning for transformer-based masked language-models[J]. arXiv:2106.10199, 2021. [61] GUO D, RUSH A M, KIM Y. Parameter-efficient transfer learning with diff pruning[J]. arXiv:2012.07463, 2020. [62] HU E J, SHEN Y, WALLIS P, et al. LoRA: low-rank adaptation of large language models[J]. arXiv:2106.09685, 2021. [63] LI C, FARKHOOR H, LIU R, et al. Measuring the intrinsic dimension of objective landscapes[J]. arXiv:1804.08838, 2018. [64] BACH F R, JORDAN M I. Predictive low-rank decomposition for kernel methods[C]//Proceedings of the 22nd International Conference on Machine Learning, 2005: 33-40. [65] CHEN Y K, QIAN S J, TANG H T, et al. LongLoRA: Efficient fine-tuning of long-context large language models[J]. arXiv:2309.12307, 2023. [66] CHAVAN A, LIU Z, GUPTA D, et al. One-for-all: generalized LoRA for parameter-efficient fine-tuning[J]. arXiv:2306.07967, 2023. [67] ZHANG Q, CHEN M, BUKHARIN A, et al. Adaptive budget allocation for parameter-efficient fine-tuning[J]. arXiv:2303.10512, 2023. [68] LUO M, XU X, LIU Y, et al. In-context learning with retrieved demonstrations for language models: a survey[J]. arXiv:2401.11624, 2024. [69] RAZEGHI Y, LOGAN IV R L, GARDNER M, et al. Impact of pretraining term frequencies on few-shot reasoning[J]. arXiv:2202.07206, 2022. [70] XIE S M, RAGHUNATHAN A, LIANG P, et al. An explanation of in-context learning as implicit bayesian inference[J]. arXiv:2111.02080, 2021. [71] LIU J, SHEN D, ZHANG Y, et al. What makes good in-context examples for GPT-3?[J]. arXiv:2101.06804, 2021. [72] HOLTZMAN A, WEST P, SCHWARTZ V, et al. Surface form competition: why the highest probability answer isn’t always right[J]. arXiv:2104.08315, 2021. [73] ZHAO T Z, WALLACE E, FENG S, et al. Calibrate before use: improving few-shot performance of language models[J]. arXiv:2102.09690, 2021. [74] WEI J, WANG X, SCHUURMANS D, et al. Chain of thought prompting elicits reasoning in large language models[J]. arXiv:2201.11903, 2022. [75] QIAO S, OU Y, ZHANG N, et al. Reasoning with language model prompting: a survey[J]. arXiv:2212.09597, 2022. [76] CHEN W H, MA X G, WANG X Y, et al. Program of thoughts prompting: disentangling computation from reasoning for numerical reasoning tasks[J]. arXiv:2211.12588, 2022 [77] LONG J Y. Large language model guided tree-of-thought[J]. arXiv:2305.08291, 2023. [78] NING X F, LIN Z N, ZHOU Z X, et al. Skeleton-of-thought: Large language models can do parallel decoding[J]. arXiv:2307.15337, 2023. [79] BESTA M, BLACH N, KUBICEK A, et al. Graph of thoughts: solving elaborate problems with large language models[J]. arXiv:2308.09687, 2023. [80] LEI B, LIN P H, LIAO C, et al. Boosting logical reasoning in large language models through a new framework: the graph of thought [J]. arXiv:2308.08614, 2023. [81] 林令德, 刘纳, 王正安. Adapter与Prompt Tuning微调方法研究综述[J]. 计算机工程与应用, 2023, 59(2): 12-21. LIN L D, LIU N, WANG Z A. Review of research on Adapter and Prompt Tuning[J]. Computer Engineering and Applications, 2023, 59(2): 12-21. [82] SHIN T, RAZEGHI Y, LOGAN I R L, et al. Autoprompt: eliciting knowledge from language models with automatically generated prompts[J]. arXiv:2010.15980, 2020. [83] GAO T, FISCH A, CHEN D. Making pre-trained language models better few-shot learners[J]. arXiv:2012.15723, 2020. [84] LIU X, ZHENG Y, DU Z, et al. GPT understands, too[J]. arXiv:2103.10385, 2021. [85] LESTER B, AL-RFOU R, CONSTANT N. The power of scale for parameter-efficient prompt tuning[J]. arXiv:2104.08691, 2021. [86] QIN G, EISNER J. Learning how to ask: querying LMs with mixtures of soft prompts[J]. arXiv:2104.06599, 2021. [87] LONGPRE S, HOU L, VU T, et al. The flan collection: designing data and methods for effective instruction tuning[J]. arXiv:2301.13688, 2023. [88] SANH V, WEBSON A, RAFFEL C, et al. Multitask prompted training enables zero-shot task generalization[J]. arXiv:2110. 08207, 2021. [89] XUE F Z, JAIN K, SHAH M H, et al. Instruction in the wild: a user-based instruction dataset[EB/OL]. [2023-11-23]. https://github.com/XueFuzhao/InstructionWild. [90] WANG Y Z, MISHRA S, ALIPOORMOLABASHI P, et al. Super-naturalinstructions: generalization via declarative instructions on 1600+ NLP tasks[J]. arXiv:2204.07705, 2022. [91] MUENNIGHOFF N, WANG T, SUTAWIKA L, et al. Crosslingual generalization through multitask finetuning[J]. arXiv:2211.01786, 2022. [92] DING N, CHEN Y, XU B, et al. Enhancing chat language models by scaling high-quality instructional conversations[J]. arXiv:2305.14233, 2023. [93] YAO S Y, YU D, ZHAO J, et al. Tree of thoughts: deliberate problem solving with large language models[J]. arXiv:2305. 10601, 2023. [94] XU Z Y, SHEN Y, HUANG L F. Multiinstruct: improving multi-modal zero shot learning via instruction tuning[J]. arXiv:2212.10773, 2022. [95] BARAL C, YANG Y Z, BLANC E, et al. Towards development of models that learn new tasks from instructions[D]. Phoenix City: Arizona State University, 2023. [96] MARTIN A, ASHIISH A, PAUL B, et al. Tensorflow: large-scale machine learning on heterogeneous distributed systems[J]. arXiv:1603.04467, 2016. [97] OUYANG L, WU J, JIANG X, et al. Training language models to follow instructions with human feedback[J]. arXiv:2203. 02155, 2022. [98] BAI Y, JONES A, NDOUSSE K, et al. Training a helpful and harmless assistant with reinforcement learning from human feedback[J]. arXiv:2204.05862, 2022. [99] SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms[J]. arXiv:1707.06347, 2017. [100] BAI Y T, KADAVATH S, KUNDU S, et al. Constitutional AI: harmlessness from AI feedback. 2022[J]. arXiv:2212. 08073, 2022. [101] LEE H, PHATALE S, MANSOOR H, et al. RLAIF: scaling reinforcement learning from human feedback with ai feedback[J]. arXiv:2309.00267, 2023. [102] WU Z X, LIU N F, POTTS C. Identifying the limits of cross-domain knowledge transfer for pretrained models[J]. arXiv:2104.08410, 2021. [103] QI X, ZENG Y, XIE T, et al. Fine-tuning aligned language models compromises safety, even when users do not intend to![J]. arXiv:2310.03693, 2023. [104] HE J, CHEN J, HE S, et al. AdaMix: mixture-of-adaptations for parameter-efficient model tuning[J]. arXiv:2205.09717, 2022. [105] ZHAO W X, ZHOU K, LI J, et al. A survey of large language models[J]. arXiv:2303.18223, 2023. [106] HOULSBY N, GIURGIU A, JASTRZEBSKI S, et al. Parameter-efficient transfer learning for NLP[J]. arXiv:1902.00751, 2019. [107] WANG A, SINGH A, HILL F, et al. GLUE: a multi-task benchmark and analysis platform for natural language understanding[J]. arXiv:1804.07461, 2018. [108] HE R, LIU L, YE H, et al. On the effectiveness of adapter-based tuning for pretrained language model adaptation[J]. arXiv:2106.03164, 2021. [109] YANG H, LI P, LAM W. Parameter-efficient tuning by manipulating hidden states of pretrained language models for classification tasks[J]. arXiv:2204.04596, 2022. [110] HE P, LIU X, GAO J, et al. DeBERTa: decoding-enhanced BERT with disentangled attention[J]. arXiv:2006.03654, 2020. [111] ZHAI X, PUIGCERVER J, KOLESNIKOV A, et al. A large-scale study of representation learning with the visual task adaptation benchmark[J]. arXiv:1910.04867, 2019. [112] BANSAL M, KUMAR M, SACHDEVA M, et al. Transfer learning for image classification using VGG19: Caltech-101 image data set[J]. Journal of Ambient Intelligence and Humanized Computing, 2023, 14(4): 3609-3620. [113] HELBER P, BISCHKE B, DENGEL A, et al. EuroSAT: a novel dataset and deep learning benchmark for land use and land cover classification[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2019, 12(7): 2217-2226. [114] JOHNSON J, HARIHARAN B, MAATEN L V D, et al. Clevr: a diagnostic dataset for compositional language and elementary visual reasoning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 2901-2910. [115] WILLIAMS A, NANGIA N, BOWMAN S R. A broad-coverage challenge corpus for sentence understanding through inference[J]. arXiv:1704.05426, 2017. [116] WOLF T, DEBUT L, SANH V, et al. Transformers: state-of-the-art natural language processing[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, 2020: 38-45. [117] HE J, ZHOU C, MA X, et al. Towards a unified view of parameter-efficient transfer learning[J]. arXiv:2110.04366, 2021. [118] CHRISTIANO P, LEIKE J, BROWN T B, et al. Deep reinforcement learning from human preferences[J]. arXiv:1706. 03741, 2017. [119] KINGMA D P, BA J. ADAM: a method for stochastic optimization[J]. arXiv:1412.6980, 2014. [120] ZIEGLER D M, STIENNON N, WU J, et al. Fine-tuning language models from human preferences[J]. arXiv:1909. 08593, 2019. [121] GANESAN K. Rouge 2.0: updated and improved measures for evaluation of summarization tasks[J]. arXiv:1803.01937, 2018. [122] TOUVRON H, MARTIN L, STONE K, et al. LLaMA 2: open foundation and fine-tuned chat models[J]. arXiv:2307. 09288, 2023. [123] CASPER S, DAVIES X, SHI C, et al. Open problems and fundamental limitations of reinforcement learning from human feedback[J]. arXiv:2307.15217, 2023. |
[1] | 王旭阳, 庞文倩, 赵丽婕. 多模态方面级情感分析的多视图交互学习网络[J]. 计算机工程与应用, 2024, 60(7): 92-100. |
[2] | 陈钊鸿, 洪智勇, 余文华, 张昕. 采用平衡函数的大规模多标签文本分类[J]. 计算机工程与应用, 2024, 60(4): 163-172. |
[3] | 崔金满, 李冬梅, 田萱, 孟湘皓, 杨宇, 崔晓晖. 提示学习研究综述[J]. 计算机工程与应用, 2024, 60(23): 1-27. |
[4] | 姚奕, 陈朝阳, 杜晓明, 姚天磊, 李青尚, 孙鸣蔚. 多模态知识图谱构建技术及其在军事领域的应用综述[J]. 计算机工程与应用, 2024, 60(22): 18-37. |
[5] | 臧洁, 鲁锦涛, 王妍, 李翔, 廖慧之. 融合双通道特征的中文短文本情感分类模型[J]. 计算机工程与应用, 2024, 60(21): 116-126. |
[6] | 洛桑嘎登, 尼玛扎西. 基于藏文字符感知的文本预训练模型方法研究[J]. 计算机工程与应用, 2024, 60(21): 127-133. |
[7] | 张恒玮, 徐林森, 陈根, 汪志焕, 眭翔. 基于迁移学习和表面肌电信号的上肢动作识别[J]. 计算机工程与应用, 2024, 60(20): 124-132. |
[8] | 苏易礌, 李卫军, 刘雪洋, 丁建平, 刘世侠, 李浩南, 李贯峰. 基于图神经网络的文本分类方法研究综述[J]. 计算机工程与应用, 2024, 60(19): 1-17. |
[9] | 王楠, 谭舒孺, 谢晓兰, 李海荣. 舆情事件向量预训练模型[J]. 计算机工程与应用, 2024, 60(18): 189-197. |
[10] | 高帅, 奚雪峰, 郑倩, 崔志明, 盛胜利. 面向数据可视化的自然语言接口研究综述[J]. 计算机工程与应用, 2024, 60(15): 24-41. |
[11] | 李熠, 耿朝阳, 杨丹. 基于Fin-BERT的中文金融领域事件抽取方法[J]. 计算机工程与应用, 2024, 60(14): 123-132. |
[12] | 于丰瑞. 网络威胁技战术情报自动化识别提取研究综述[J]. 计算机工程与应用, 2024, 60(13): 1-22. |
[13] | 赵继贵, 钱育蓉, 王魁, 侯树祥, 陈嘉颖. 中文命名实体识别研究综述[J]. 计算机工程与应用, 2024, 60(1): 15-27. |
[14] | 曾慧玲, 李琳, 吕思洋, 何铮. 提示学习驱动的新闻舆情风险识别方法研究[J]. 计算机工程与应用, 2024, 60(1): 182-188. |
[15] | 肖立中, 臧中兴, 宋赛赛. 融合自注意力的关系抽取级联标记框架研究[J]. 计算机工程与应用, 2023, 59(3): 77-83. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||