
计算机工程与应用 ›› 2025, Vol. 61 ›› Issue (1): 80-97.DOI: 10.3778/j.issn.1002-8331.2405-0348
黄施洋,奚雪峰,崔志明
出版日期:2025-01-01
发布日期:2024-12-30
HUANG Shiyang, XI Xuefeng, CUI Zhiming
Online:2025-01-01
Published:2024-12-30
摘要: 自然语言处理是实现人机交互的关键步骤,而汉语自然语言处理(Chinese natural language processing, CNLP)是其中的重要组成部分。随着大模型技术的发展,CNLP进入了一个新的阶段,这些汉语大模型具备更强的泛化能力和更快的任务适应性。然而,相较于英语大模型,汉语大模型在逻辑推理和文本理解能力方面仍存在不足。介绍了图神经网络在特定CNLP任务中的优势,进行了量子机器学习在CNLP发展潜力的调查。总结了大模型的基本原理和技术架构,详细整理了大模型评测任务的典型数据集和模型评价指标,评估比较了当前主流的大模型在CNLP任务中的效果。分析了当前CNLP存在的挑战,并对CNLP任务的未来研究方向进行了展望,希望能帮助解决当前CNLP存在的挑战,同时为新方法的提出提供了一定的参考。
黄施洋, 奚雪峰, 崔志明. 大模型时代下的汉语自然语言处理研究与探索[J]. 计算机工程与应用, 2025, 61(1): 80-97.
HUANG Shiyang, XI Xuefeng, CUI Zhiming. Research and Exploration on Chinese Natural Language Processing in Era of Large Language Models[J]. Computer Engineering and Applications, 2025, 61(1): 80-97.
| [1] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Advances in Neural Information Processing Systems, 2017. [2] DONG C, LI Y, GONG H, et al. A survey of natural language generation[J]. ACM Computing Surveys, 2022, 55(8): 1-38. [3] KARANIKOLAS N, MANGA E, SAMARIDI N, et al. Large language models versus natural language understanding and generation[C]//Proceedings of the 27th Pan-Hellenic Conference on Progress in Computing and Informatics, 2023: 278-290. [4] SHEN S Z, LANG H, WANG B, et al. Learning to decode collaboratively with multiple language models[J]. arXiv:2403. 03870, 2024. [5] WANG Z, ZHAO S, WANG Y, et al. Re-TASK: revisiting LLM tasks from capability, skill, and knowledge perspectives[J]. arXiv:2408.06904, 2024. [6] GARCíA-FERRERO I, AGERRI R, SALAZAR A A, et al. Medical mT5: an open-source multilingual text-to-text LLM for the medical domain[J]. arXiv:2404.07613, 2024. [7] WU Y, ZHOU S, LIU Y, et al. Precedent-enhanced legal judgment prediction with LLM and domain-model collaboration[J]. arXiv:2310.09241, 2023. [8] ZHAO H, LIU Z, WU Z, et al. Revolutionizing finance with LLMs: an overview of applications and insights[J]. arXiv:2401.11641, 2024. [9] MANIGRASSO F, SCHOUTEN S, MORRA L, et al. Probing LLMs for logical reasoning[C]//Proceedings of the 18th International Conference on Neural-Symbolic Learning and Reasoning, 2024: 257-278. [10] NAMAZIFAR M, PAPANGELIS A, TUR G, et al. Language model is all you need: natural language understanding as question answering[C]//Proceedings of the 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021: 7803-7807. [11] WONG K F, LI W, XU R, et al. Introduction to Chinese natural language processing[M]. Cham: Springer, 2022. [12] WU Z, PAN S, CHEN F, et al. A comprehensive survey on graph neural networks[J]. IEEE Transactions on Neural Networks and Learning Systems, 2020, 32(1): 4-24. [13] LI Y, TARLOW D, BROCKSCHMIDT M, et al. Gated graph sequence neural networks[J]. arXiv:1511.05493, 2015. [14] DAI H J, KOZAREVA Z, DAI B, et al. Learning steady-states of iterative algorithms over graphs[C]//Proceedings of the 35th International Conference on Machine Learning, Stockholmsm?ssan, 2018: 1114-1122. [15] PARK N, KAN A, DONG X L, et al. Estimating node importance in knowledge graphs using graph neural networks[C]//Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Anchorage, Aug 4-8, 2019. New York: ACM, 2019: 596-606. [16] CHEN X, DING L, XIANG Y. Neighborhood aggregation based graph attention networks for open-world knowledge graph reasoning[J]. Journal of Intelligent & Fuzzy Systems, 2021, 41(2): 3797-3808. [17] BOJCHEVSKI A, SHCHUR O, ZüGNER D, et al. NetGAN: generating graphs via random walks[C]//Proceedings of the 35th International Conference on Machine Learning, Stockholm, 2018: 610-619. [18] WANG S, WEI X, NOGUEIRADOS SANTOS C N, et al. Mixed-curvature multi-relational graph neural network for knowledge graph completion[C]//Proceedings of the 30th Web Conference 2021, Ljubljana, Apr 19-23, 2021. New York: ACM, 2021: 1761-1771. [19] ZHENG C P, FAN X L, WANG C, et al. GMAN: a graph multi-attention network for traffic prediction[C]//Proceedings of the 34th AAAI Conference on Artificial Intelligence, the 32nd Innovative Applications of Artificial Intelligence Conference, the 10th AAAI Symposium on Educational Advances in Artificial Intelligence, New York, Feb 7-12, 2020. Menlo Park: AAAI, 2020: 1234-1241. [20] KHALED A, ELSIR A M T, SHEN Y. TFGAN: traffic forecasting using generative adversarial network with multigraph convolutional network[J]. Knowledge-Based Systems, 2022: 108990. [21] GUI T, ZOU Y, ZHANG Q, et al. A lexicon-based graph neural network for Chinese NER[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019: 1040-1050. [22] WEISCHEDEL R, PRADHAN S, RAMSHAW L, et al. Ontonotes release 4.0[Z]. Philadelphia, Penn: Linguistic Data Consortium, 2011. [23] LEVOW G A. The third international Chinese language processing bakeoff: word segmentation and named entity recognition[C]//Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing, 2006: 108-117. [24] PENG N, DREDZE M. Named entity recognition for Chinese social media with jointly trained embeddings[C]//Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2015: 548-554. [25] ZHANG Y, YANG J. Chinese NER using lattice LSTM[J]. arXiv:1805.02023, 2018. [26] WANG Y, LU L, WU Y, et al. Polymorphic graph attention network for Chinese NER[J]. Expert Systems with Applications, 2022, 203: 117467. [27] DU J, MI W, DU X. Chinese word segmentation in electronic medical record text via graph neural network-bidirectional LSTM-CRF model[C]//Proceedings of the 2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2020: 985-989. [28] 汪凯, 刘明童, 张玉洁, 等. 基于图神经网络的汉语依存分析和语义组合计算联合模型[J]. 中文信息学报, 2022, 36(7): 24-32. WANG K, LIU M T, ZHANG Y J, et al. Joint learning Chinese dependency parsing and semantic composition based on graph neural network[J]. Journal of Chinese Information Processing, 2022, 36(7): 24-32. [29] LIU X, CHEN Q, DENG C, et al. LCQMC: a large-scale Chinese question matching corpus[C]//Proceedings of the 27th International Conference on Computational Linguistics, 2018: 1952-1962. [30] 郑浩, 李源, 沈威, 等. 结合注意力机制与图卷积网络的汉语复句关系识别[J]. 中文信息学报, 2022, 36(11): 60-67. ZHENG H, LI Y, SHEN W, et al. Chinese complex sentence relation identification based on attention mechanism and graph convolutional network[J]. Journal of Chinese Information Processing, 2022, 36(11): 60-67. [31] 孙凯丽, 邓沌华, 李源, 等. 基于句内注意力机制多路 CNN 的汉语复句关系识别方法[J]. 中文信息学报, 2020, 34(6): 9-17. SUN K L, DENG D H, LI Y, et al. Inner-attention based multi-way convolutional neural network for relation recognition in Chinese compound sentence[J]. Journal of Chinese Information Processing, 2020, 34(6): 9-17. [32] ZHOU Y, XUE N. The Chinese discourse TreeBank: a Chinese corpus annotated with discourse relations[J]. Language Resources and Evaluation, 2015, 49: 397-431. [33] 普浏清, 余正涛, 文永华, 等. 基于依存图网络的汉越神经机器翻译方法[J]. 中文信息学报, 2021, 35(12): 68-75. PU L Q, YU Z T, WEN Y H, et al. Chinese Vietnamese neural machine translation method based on dependency graph network[J]. Journal of Chinese Information Processing, 2021, 35(12): 68-75. [34] HAO J, TIAN Y, ZHANG Q, et al. Mongolian-Chinese machine translation based on graph neural network[J]. Journal of Physics: Conference Series, 2022, 2400(1): 012050. [35] BIAMONTE J, WITTEK P, PANCOTTI N, et al. Quantum machine learning[J]. Nature, 2017, 549(7671): 195-202. [36] NIELSEN M A, CHUANG I L. Quantum computation and quantum information[M]. Cambridge: Cambridge University Press, 2010. [37] 王健, 张蕊, 姜楠. 量子机器学习综述[J]. 软件学报, 2024, 35(8): 3843-3877. WANG J, ZHANG R, JIANG N. Survey on quantum machine learning[J]. Journal of Software, 2024, 35(8): 3843-3877. [38] PANDEY S, BASISTH N J, SACHAN T, et al. Quantum machine learning for natural language processing application[J]. Physica A: Statistical Mechanics and Its Applications, 2023, 627: 129123. [39] DANG Y, JIANG N, HU H, et al. Image classification based on quantum K-nearest-neighbor algorithm[J]. Quantum Information Processing, 2018, 17: 1-18. [40] REBENTROST P, MOHSENI M, LLOYD S. Quantum support vector machine for big data classification[J]. Physical Review Letters, 2014, 113(13): 130503. [41] KHOSHAMAN A, VINCI W, DENIS B, et al. Quantum variational autoencoder[J]. Quantum Science and Technology, 2018, 4(1): 014001. [42] CONG I, CHOI S, LUKIN M D. Quantum convolutional neural networks[J]. Nature Physics, 2019, 15(12): 1273-1278. [43] CHEN G, CHEN Q, LONG S, et al. Quantum convolutional neural network for image classification[J]. Pattern Analysis and Applications, 2023, 26(2): 655-667. [44] HUANG H Y, BROUGHTON M, MOHSENI M, et al. Power of data in quantum machine learning[J]. Nature Communications, 2021, 12(1): 2631. [45] HAVLí?EK V, CóRCOLES A D, TEMME K, et al. Supervised learning with quantum-enhanced feature spaces[J]. Nature, 2019, 567(7747): 209-212. [46] PERAL-GARCíA D, CRUZ-BENITO J, GARCíA-PE?ALVO F J. Systematic literature review: quantum machine learning and its applications[J]. Computer Science Review, 2024, 51: 100619. [47] MARELLA S T, PARISA H S K. Introduction to quantum computing[M]//Quantum computing and communications. London: Springer-Verlag, 2020. [48] YIN W, KANN K, YU M, et al. Comparative study of CNN and RNN for natural language processing[J]. arXiv:1702. 01923, 2017. [49] SHERSTINSKY A. Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network[J]. Physica D: Nonlinear Phenomena, 2020, 404: 132306. [50] WANG S, JIANG J. Learning natural language inference with LSTM[J]. arXiv:1512.08849, 2015. [51] DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[J]. arXiv:1810.04805, 2018. [52] LIU Y, OTT M, GOYAL N, et al. RoBERTa: a robustly optimized BERT pretraining approach[J]. arXiv:1907.11692, 2019. [53] LAN Z. ALBERT: a lite BERT for self-supervised learning of language representations[J]. arXiv:1909.11942, 2019. [54] CAI W, JIANG J, WANG F, et al. A survey on mixture of experts[J]. arXiv:2407.06204, 2024. [55] RADFORD A, NARASIMHAN K, SALIMANS T, et al. Improving language understanding by generative pre-training[R/OL].[2024-08-31]. https://openai.com/index/language-unsupervised/. [56] RADFORD A, WU J, CHILD R, et al. Language models are unsupervised multitask learners[J]. OpenAI Blog, 2019, 1(8): 9. [57] BROWN T B. Language models are few-shot learners[J]. arXiv:2005.14165, 2020. [58] ACHIAM J, ADLER S, AGARWAL S, et al. GPT-4 technical report[J]. arXiv:2303.08774, 2023. [59] BAI J, BAI S, CHU Y, et al. Qwen technical report[J]. arXiv:2309.16609, 2023. [60] GLM T, ZENG A, XU B, et al. ChatGLM: a family of large language models from GLM-130B to GLM-4 all tools[J]. arXiv:2406.12793, 2024. [61] YANG A, XIAO B, WANG B, et al. Baichuan 2: open large-scale language models[J]. arXiv:2309.10305, 2023. [62] Labring. FastGPT[EB/OL].[2024-05-15]. https://github.com/labring/FastGPT. [63] NetEase Youdao. QAnything[EB/OL].[2024-05-15]. https://github.com/netease-youdao/QAnything. [64] HUANG Y, BAI Y, ZHU Z, et al. C-Eval: a multi-level multi-discipline Chinese evaluation suite for foundation models[C]//Advances in Neural Information Processing Systems, 2024. [65] LI H, ZHANG Y, KOTO F, et al. CMMLU: measuring massive multitask language understanding in Chinese[J]. arXiv:2306.09212, 2023. [66] XU L, LI A, ZHU L, et al. SuperCLUE: a comprehensive Chinese large language model benchmark[J]. arXiv:2307. 15020, 2023. [67] ZHANG X, LI C, ZONG Y, et al. Evaluating the performance of large language models on gaokao benchmark[J]. arXiv:2305.12474, 2023. [68] ZHONG W, CUI R, GUO Y, et al. AGIEval: a human-centric benchmark for evaluating foundation models[J]. arXiv:2304.06364, 2023. [69] HENDRYCKS D, BURNS C, BASART S, et al. Measuring massive multitask language understanding[J]. arXiv:2009. 03300, 2020. [70] COBBE K, KOSARAJU V, BAVARIAN M, et al. Training verifiers to solve math word problems[J]. arXiv:2110.14168, 2021. [71] HENDRYCKS D, BURNS C, KADAVATH S, et al. Measuring mathematical problem solving with the math dataset[J]. arXiv:2103.03874, 2021. [72] CHEN M, TWOREK J, JUN H, et al. Evaluating large language models trained on code[J]. arXiv:2107.03374, 2021. [73] AUSTIN J, ODENA A, NYE M, et al. Program synthesis with large language models[J]. arXiv:2108.07732, 2021. [74] SUZGUN M, SCALES N, SCH?RLI N, et al. Challenging big-bench tasks and whether chain-of-thought can solve them[J]. arXiv:2210.09261, 2022. [75] OMAR M, SOFFER S, CHARNEY A W, et al. Applications of large language models in psychiatry: a systematic review[J]. Front Psychiatry, 2024, 15: 1422807. [76] 孔芳, 王红玲, 周国栋. 汉语篇章理解研究综述[J]. 软件学报, 2019, 30(7): 2052-2072. KONG F, WANG H L, ZHOU G D. Suvery on Chinese discourse understanding[J]. Journal of Software, 2019, 30(7): 2052-2072. [77] JI Z, LEE N, FRIESKE R, et al. Survey of hallucination in natural language generation[J]. ACM Computing Surveys, 2023, 55(12): 1-38. [78] RAWTE V, SHETH A, DAS A. A survey of hallucination in large foundation models[J]. arXiv:2309.05922, 2023. [79] LV X, LIU Z, ZHAO Y, et al. HBert: a long text processing method based on BERT and hierarchical attention mechanisms[J]. International Journal on Semantic Web and Information Systems, 2023, 19(1): 1-14. [80] ZHAO P, ZHANG H, YU Q, et al. Retrieval-augmented generation for AI-generated content: a survey[J]. arXiv:2402.19473, 2024. [81] Labellerr. Data collection and preprocessing for large language models[EB/OL].[2024-05-15]. https://www.labellerr.com/blog/data-collection-and-preprocessing-for-large-language-models/. [82] LIANG X, SONG S, NIU S, et al. UHGEval: benchmarking the hallucination of Chinese large language models via unconstrained generation[J]. arXiv:2311.15296, 2023. [83] PAN S, LUO L, WANG Y, et al. Unifying large language models and knowledge graphs: a roadmap[J]. IEEE Transactions on Knowledge and Data Engineering, 2024, 36(7): 3580-3599. [84] BORGEAUD S, MENSCH A, HOFFMANN J, et al. Improving language models by retrieving from trillions of tokens[C]//Proceedings of the International Conference on Machine Learning, 2022: 2206-2240. [85] 付雷杰, 曹岩, 白瑀, 等. 国内垂直领域知识图谱发展现状与展望 [J]. 计算机应用研究, 2021, 38 (11): 3201-3214. FU L J, CAO Y, BAI Y, et al. Development status and prospect of vertical domain knowledge graph in China[J]. Application Research of Computers, 2021, 38(11): 3201-3214. [86] THIRUNAVUKARASU A J, TING D S J, ELANGOVAN K, et al. Large language models in medicine[J]. Nature Medicine, 2023, 29(8): 1930-1940. [87] LAI J, GAN W, WU J, et al. Large language models in law: a survey[J]. arXiv:2312.03718, 2023. [88] GAO Y, XIONG Y, GAO X, et al. Retrieval-augmented generation for large language models: a survey[J]. arXiv:2312. 10997, 2023. |
| [1] | 朱峰冉, 王慧颖, 林晓丽, 李全鑫, 庞俊. 结合多尺度注意力和动态构建的非均匀超图聚类模型[J]. 计算机工程与应用, 2025, 61(2): 200-207. |
| [2] | 陈旭, 张硕, 景永俊, 王叔洋. 混合特征平衡图注意力网络日志异常检测模型[J]. 计算机工程与应用, 2025, 61(1): 308-320. |
| [3] | 路秋霖, 王慧颖, 朱峰冉, 李全鑫, 庞俊. 结合图神经网络和图对比学习的半监督多图分类[J]. 计算机工程与应用, 2025, 61(1): 368-374. |
| [4] | 张俊三, 肖森, 高慧, 邵明文, 张培颖, 朱杰. 基于邻域采样的多任务图推荐算法[J]. 计算机工程与应用, 2024, 60(9): 172-180. |
| [5] | 宋建平, 王毅, 孙开伟, 刘期烈. 结合双曲图注意力网络与标签信息的短文本分类方法[J]. 计算机工程与应用, 2024, 60(9): 188-195. |
| [6] | 汪维泰, 王晓强, 李雷孝, 陶乙豪, 林浩. 时空图神经网络在交通流预测研究中的构建与应用综述[J]. 计算机工程与应用, 2024, 60(8): 31-45. |
| [7] | 郑小丽, 王巍, 杜雨晅, 张闯. 面向会话的需求感知注意图神经网络推荐模型[J]. 计算机工程与应用, 2024, 60(7): 128-140. |
| [8] | 唐宇, 吴贞东. 基于残差网络的轻量级图卷积推荐方法[J]. 计算机工程与应用, 2024, 60(3): 205-212. |
| [9] | 赵韶辉, 马晓, 王建霞. 结合路径掩蔽和双解码器的图自编码器框架[J]. 计算机工程与应用, 2024, 60(24): 140-148. |
| [10] | 陈俊臻, 王淑营, 罗浩然. 融合大模型微调与图神经网络的知识图谱问答[J]. 计算机工程与应用, 2024, 60(24): 166-176. |
| [11] | 罗国宇, 汪学舜, 戴锦友. 物联网入侵检测的随机特征图神经网络模型[J]. 计算机工程与应用, 2024, 60(21): 264-273. |
| [12] | 张其, 陈旭, 王叔洋, 景永俊, 宋吉飞. 动态图神经网络链接预测综述[J]. 计算机工程与应用, 2024, 60(20): 49-67. |
| [13] | 陈万志, 王军. 时间感知增强的动态图神经网络序列推荐算法[J]. 计算机工程与应用, 2024, 60(20): 142-152. |
| [14] | 邓戈龙, 黄国恒, 陈紫嫣. 图神经网络的类别解耦小样本分类[J]. 计算机工程与应用, 2024, 60(2): 129-136. |
| [15] | 梁梅霖, 段友祥, 昌伦杰, 孙歧峰. 邻域信息分层感知的知识图谱补全方法[J]. 计算机工程与应用, 2024, 60(2): 147-153. |
| 阅读次数 | ||||||
|
全文 |
|
|||||
|
摘要 |
|
|||||