Computer Engineering and Applications ›› 2023, Vol. 59 ›› Issue (2): 12-21.DOI: 10.3778/j.issn.1002-8331.2209-0025
• Research Hotspots and Reviews • Previous Articles Next Articles
LIN Lingde, LIU Na, WANG Zheng'an
Online:
2023-01-15
Published:
2023-01-15
林令德,刘纳,王正安
LIN Lingde, LIU Na, WANG Zheng'an. Review of Research on Adapter and Prompt Tuning[J]. Computer Engineering and Applications, 2023, 59(2): 12-21.
林令德, 刘纳, 王正安. Adapter与Prompt Tuning微调方法研究综述[J]. 计算机工程与应用, 2023, 59(2): 12-21.
Add to citation manager EndNote|Ris|BibTeX
URL: http://cea.ceaj.org/EN/10.3778/j.issn.1002-8331.2209-0025
[1] KALCHBRENNER N,GREFENSTETTE E,BLUNSOM P.A convolutional neural network for modelling sentences[C]//Annual Meeting of the Association for Computational Linguistics,2014:1-11. [2] KIM Y.Convolutional neural networks for sentence classification[C]//Conference on Empirical Methods in Natural Language Processing,2014:1746-1751. [3] GEHRING J,AULI M,GRANGIER D,et al.Convolutional sequence to sequence learning[C]//International Conference on Machine Learning,2017:1243-1252. [4] SUTSKEVER I,VINYALS O,LE Q V.Sequence to sequence learning with neural networks[C]//Conference and Workshop on Neural Information Processing Systems,2014:3104-3112. [5] LIU P F,QIU X P,HUANG X J.Recurrent neural network for text classification with multi-task learning[C]//International Joint Conferences on Artificial Intelligence,2016:1-7. [6] SOCHER R,PERELYGIN A,WU J Y,et al.Recursive deep models for semantic compositionality over a sentiment treebank[C]//2013 Conference on Empirical Methods in Natural Language Processing,2013:1-12. [7] TAI K S,SOCHER R,MANNING C D.Improved semantic representations from tree-structured long short-term memory networks[C]//International Joint Conference on Natural Language Processing,2015:1-11. [8] MARCHEGGIANI D,BASTINGS J,TITOV I.Exploiting semantics in neural machine translation with graph convolutional networks[C]//Annual Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies,2018:486-492. [9] QIU X,SUN T,XU Y,et al.Pre-trained models for natural language processing:a survey[J].Science China Technological Sciences,2020,63(10):1872-1897. [10] PETERS M,NEUMANN M,IYYER M,et al.Deep contextualized word representations[C]//Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies,2018:1-15. [11] RADFORD A,NARASIMHAN K,SALIMANS T,et al.Improving language understanding by generative pre-training[EB/OL].[2020-09-26].https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf. [12] DEVLIN J,CHANG M,LEE K,et al.BERT:pre-training of deep bidirectional transformers for language understanding[J].arXiv:1810.04805,2018. [13] YANG Z,DAI Z,YANG Y,et al.XLNet:generalized auto regressive pretraining for language understanding[C]//Proceedings of the 32nd Annual Conference on Neural Information Processing Systems,Vancouver,Dec 8-14,2019:5754-5764. [14] CLARK K,LUONG M,LE Q V,et al.Electra:pre-training text encoders as discriminators rather than generators[J].arXiv:2003.10555,2020. [15] LAN Z,CHEN M,GOODMAN S,et al.ALBERT:a lite BERT for self-supervised learning of language representations[J].arXiv:1909.11942,2019. [16] VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need[C]//Advances in Neural Information Processing Systems,2017:5998-6008. [17] SMITH S,PATWARY M,NORICK B,et al.Using deepSpeed and megatron to train megatron-Turing NLG 530B,a large-scale generative language model[J].arXiv:2201. 11990,2022. [18] SUN Y,WANG S,FENG S,et al.ERNIE 3.0:large-scale knowledge enhanced pre-training for language understanding and generation[J].arXiv:2107.02137,2021. [19] WU S,ZHAO X,YU T,et al.Yuan 1.0:large-scale pre-trained language model in zero-shot and few-shot learning[J].arXiv:2110.04725,2021. [20] YUE Z,ZHANG H,SUN Q,et al.Interventional few-shot learning[C]//Advances in Neural Information Processing Systems,2020:2734-2746. [21] HAN X,ZHANG Z,DING N,et al.Pre-trained models:past,present and future[J].AI Open,2021:225-250. [22] LIU P,YUAN W,FU J,et al.Pre-train,prompt,and predict:a systematic survey of prompting methods in natural language processing[J].arXiv:2107.13586,2021. [23] HOULSBY N,GIURGIU A,JASTRZEBSKI S,et al.Parameter-efficient transfer learning for NLP[C]//International Conference on Machine Learning,2019:2790-2799. [24] MCCLOSKEY M,COHEN N J.Catastrophic interference in connectionist networks:the sequential learning problem[M]//Psychology of learning and motivation.[S.l.]:Academic Press,1989:109-165. [25] FRENCH R M.Catastrophic forgetting in connectionist networks[J].Trends in Cognitive Sciences,1999,3(4):128-135. [26] PFEIFFER J,KAMATH A,RüCKLé A,et al.AdapterFusion:non-destructive task composition for transfer learning[J].arXiv:2005.00247,2020. [27] WANG A,SINGH A,MICHAEL J,et al.GLUE:a multi-task benchmark and analysis platform for natural language understanding[J].arXiv:1804.07461,2018. [28] SOCHER R,PERELYGIN A,WU J,et al.Recursive deep models for semantic compositionality over a sentiment treebank[C]//Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing,2013:1631-1642. [29] HUANG J,TANG D,SHOU L,et al.Cosqa:20,000+ web queries for code search and question answering[J].arXiv:2105.13239,2021. [30] RüCKLé A,GEIGLE G,GLOCKNER M,et al.Adapterdrop:on the efficiency of adapters in transformers[J].arXiv:2010.11918,2020. [31] BAPNA A,ARIVAZHAGAN N,FIRAT O.Simple,scalable adaptation for neural machine translation[J].arXiv:1909. 08478,2019. [32] WANG R,TANG D,DUAN N,et al.K-Adapter:infusing knowledge into pre-trained models with adapters:ACL/IJCNLP 2021[J].arXiv:2002.01808,2020. [33] PFEIFFER J,VULI? I,GUREVYCH I,et al.Mad-x:an adapter-based framework for multi-task cross-lingual transfer[J].arXiv:2005.00052,2020. [34] PHILIP J,BERARD A,GALLé M,et al.Monolingual adapters for zero-shot neural machine translation[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing(EMNLP),2020:4465-4470. [35] STICKLAND A C,MURRAY I.Bert and pals:projected attention layers for efficient adaptation in multi-task learning[C]//International Conference on Machine Learning,2019:5986-5995. [36] LAUSCHER A,MAJEWSKA O,RIBEIRO L F R,et al.Common sense or world knowledge? investigating adapter-based knowledge injection into pretrained transformers[J].arXiv:2005.11787,2020. [37] üSTüN A,BISAZZA A,BOUMA G,et al.UDapter:language adaptation for truly universal dependency[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing,2020:2302-2315. [38] VIDONI M,VULI? I,GLAVA? G.Orthogonal language and task adapters in zero-shot cross-lingual transfer[J].arXiv:2012.06460,2020. [39] MAHABADI R K,RUDER S,DEHGHANI M,et al.Parameter-efficient multi-task fine-tuning for transformers via shared hypernetworks[J].arXiv:2106.04489,2021. [40] PETRONI F,ROCKT?SCHEL T,RIEDEL S,et al.Language models as knowledge bases?[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing,2019:2463-2473. [41] BROWN T,MANN B,RYDER N,et al.Language models are few-shot learners[C]//Advances in Neural Information Processing Systems,2020:1877-1901. [42] SCHICK T,SCHüTZE H.Exploiting cloze questions for few shot text classification and natural language inference[J].arXiv:2001.07676,2020. [43] GAGE P.A new algorithm for data compression[J].C Users Journal,1994,12(2):23-38. [44] SCHICK T,SCHüTZE H.It’s not just size that matters:small language models are also few-shot learners[J].arXiv:2009.07118,2020. [45] SHIN T,RAZEGHI Y,LOGAN IV R L,et al.Autoprompt:eliciting knowledge from language models with automatically generated prompts[J].arXiv:2010.15980,2020. [46] GAO T,FISCH A,CHEN D.Making pre-trained language models better few-shot learners[J].arXiv:2012.15723,2020. [47] RAFFEL C,SHAZEER N,ROBERTS A,et al.Exploring the limits of transfer learning with a unified text-to-text transformer[J].J Mach Learn Res,2020:1-140. [48] SCHICK T,SCHMID H,SCHüTZE H.Automatically identifying words that can serve as labels for few-shot text classification[J].arXiv:2010.13641,2020. [49] SUN Y,ZHENG Y,HAO C,et al.NSP-BERT:a prompt-based zero-shot learner through an original pre-training task--next sentence prediction[J].arXiv:2109.03564,2021. [50] LIU Y,OTT M,GOYAL N,et al.Roberta:a robustly optimized bert pretraining approach[J].arXiv:1907.11692,2019. [51] YUAN W,NEUBIG G,LIU P.Bartscore:evaluating generated text as text generation[C]//Advances in Neural Information Processing Systems,2021:27263-27277. [52] HAVIV A,BERANT J,GLOBERSON A.BERTese:learning to speak to BERT[J].arXiv:2103.05327,2021. [53] LI X L,LIANG P.Prefix-tuning:optimizing continuous prompts for generation[J].arXiv:2101.00190,2021. [54] LIU X,ZHENG Y,DU Z,et al.GPT understands,too[J].arXiv:2103.10385,2021. [55] ALLEN-ZHU Z,LI Y,SONG Z.A convergence theory for deep learning via over-parameterization[C]//International Conference on Machine Learning,2019:242-252. [56] ZHANG N,LI L,CHEN X,et al.Differentiable prompt makes pre-trained language models better few-shot learners[J].arXiv:2108.13161,2021. [57] LESTER B,AL-RFOU R,CONSTANT N.The power of scale for parameter-efficient prompt tuning[C]//2021 Conference on Empirical Methods in Natural Language Processing,2021:3045-3059. [58] HAMBARDZUMYAN K,KHACHATRIAN H,MAY J.WARP:word-level adversarial reprogramming[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing,2021:4921-4933. [59] ELSAYED G F,GOODFELLOW I,SOHL-DICKSTEIN J.Adversarial reprogramming of neural networks[J].arXiv:1806.11146,2018. [60] LIU X,JI K,FU Y,et al.P-Tuning v2:prompt tuning can be comparable to fine-tuning universally across scales and tasks[J].arXiv:2110.07602,2021. |
[1] | GAN Yating, AN Jianye, XU Xue. Survey of Short Text Classification Methods Based on Deep Learning [J]. Computer Engineering and Applications, 2023, 59(4): 43-53. |
[2] | YANG Kunrong, XIONG Yu, ZHANG Jian, CHU Wen. Research on MOOC Dropout Prediction Strategy for Long- and Short-Term Mixed Data [J]. Computer Engineering and Applications, 2023, 59(4): 130-138. |
[3] | LI Ling, GUO Guangsong. Hybrid Many-Objective Evolutionary Optimization Combined with Indexs Decomposition [J]. Computer Engineering and Applications, 2023, 59(4): 165-174. |
[4] | HU Xinjue, FU Zhangjie. Hiding Two Images with High Visual Quality [J]. Computer Engineering and Applications, 2023, 59(4): 235-242. |
[5] | ZHANG Han, ZHENG Weihao, DOU Zhicheng, WEN Jirong. Integrating Multi-Layer Structure Information of Law for Legal Judgement Prediction [J]. Computer Engineering and Applications, 2023, 59(3): 253-263. |
[6] | YANG Hanyu, ZHAO Xiaoyong, WANG Lei. Review of Data Normalization Methods [J]. Computer Engineering and Applications, 2023, 59(3): 13-22. |
[7] | CHEN Xiaoting, LI Shi. Survey on Emotion Recognition in Conversation [J]. Computer Engineering and Applications, 2023, 59(3): 33-48. |
[8] | DU Yuzheng, CAO Hui, NIE Yongqi, WEI Dejian, FENG Yanyan. Application of Deep Learning in Classification and Diagnosis of Alzheimer's Disease [J]. Computer Engineering and Applications, 2023, 59(3): 49-65. |
[9] | XIAO Lizhong, ZANG Zhongxing, SONG Saisai. Research on Cascaded Labeling Framework for Relation Extraction with Self-Attention [J]. Computer Engineering and Applications, 2023, 59(3): 77-83. |
[10] | LIN Honghui, LIU Jianhua, ZHENG Zhixiong, HU Renyuan, LUO Yixuan. Multi-Task Network for Joint Dialog Act Recognition and Sentiment Classification [J]. Computer Engineering and Applications, 2023, 59(3): 104-111. |
[11] | DING Shangshang, ZHENG Tianli, YAO Kang, ZHANG Hetong, PEI Ronghao, FU Weiwei. Deep-Learning-Based Research on Refractive Detection [J]. Computer Engineering and Applications, 2023, 59(3): 193-201. |
[12] | ZHANG Dongdong, GUO Jie, CHEN Yang. 3D Object Detection Algorithm Based on Raw Point Clouds [J]. Computer Engineering and Applications, 2023, 59(3): 209-217. |
[13] | PEI Wenbin, WANG Hailong, LIU Lin, PEI Dongmei. Review of Musical Instrument Recognition in Music Information Retrieval [J]. Computer Engineering and Applications, 2023, 59(2): 34-47. |
[14] | PAN Mengzhu, LI Qianmu, QIU Tian. Survey of Research on Deep Multimodal Representation Learning [J]. Computer Engineering and Applications, 2023, 59(2): 48-64. |
[15] | WEI Shihong, LIU Hongmei, TANG Hong, ZHU Longjiao. Multilevel Metric Networks for Few-Shot Learning [J]. Computer Engineering and Applications, 2023, 59(2): 94-101. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||