Computer Engineering and Applications ›› 2025, Vol. 61 ›› Issue (20): 36-53.DOI: 10.3778/j.issn.1002-8331.2412-0070

• Research Hotspots and Reviews • Previous Articles     Next Articles

Review of Research Progress in Chinese Grammar Error Correction Technology

JU Zedong, CHENG Chunlei, YE Qing, PENG Lin, GONG Zhufan   

  1. 1.School of Computer Science, Jiangxi University of Chinese Medicine, Nanchang 330004, China
    2.Jiangxi Provincial Key Laboratory of Chinese Medicine Artificial Intelligence, Nanchang 330004, China
  • Online:2025-10-15 Published:2025-10-15

中文语法纠错技术的研究进展综述

句泽东,程春雷,叶青,彭琳,龚著凡   

  1. 1.江西中医药大学 计算机学院,南昌 330004 
    2.江西省中医人工智能重点研究室,南昌 330004

Abstract: The Chinese grammar error correction (CGEC) task aims to automatically correct Chinese sentences containing grammatical errors using natural language processing technology. This task is committed to improving the accuracy and readability of texts, enhancing the effect of information transmission, and showing its indispensable importance in many fields such as news releases, book publishing, voice input, and medical record quality control. This paper first reviews the development of Chinese grammar error correction technology, introduces the commonly used evaluation indicators and public datasets, and analyzes the main challenges faced by Chinese grammar error correction at present, including the lack of training library corpus and excessive correction. In view of these challenges, the paper systematically reviews the traditional error correction methods. Then, the research and application of sequence model method in the field of syntax error correction are further discussed, and they are mainly divided into two categories: sequence-to-sequence (Seq2Seq)-based error correction model and sequence-to-edit (Seq2Edit)-based error correction model. Finally, a new path of syntax error correction based on large language model is systematically sorted out. Based on the perspective of problem solving, this paper analyzes and summarizes various error correction models in detail, and finally looks forward to the challenges and future development directions of Chinese grammar error correction.

Key words: grammar correction, deep learning, large language models(LLMs), natural language processing technology

摘要: 中文语法错误纠正任务(Chinese grammar error correction,CGEC)旨在利用自然语言处理技术自动纠正含有语法错误的中文句子。该任务致力于提升文本的准确性与可读性,增强信息传递的效果,在新闻发布、书刊出版、语音输入、病历质控等多个领域展现出其不可或缺的重要性。回顾了中文语法纠错技术的发展脉络,介绍了常用的评价指标及公开数据集,并剖析当前中文语法纠错面临的主要挑战,包括训练库语料匮乏、过度纠正等问题。针对这些挑战,系统梳理传统的纠错方法;进而深入探讨序列模型方法在语法纠错领域的研究应用,将其主要划分为基于序列到序列(Seq2Seq)的纠错模型和基于序列到编辑(Seq2Edit)的纠错模型两大类别;系统地梳理了基于大语言模型的语法纠错新路径。立足问题解决视角,对各类纠错模型展开详尽分析与总结,最后展望了中文语法纠错面临的挑战与未来的发展方向。

关键词: 语法纠错, 深度学习, 大语言模型(LLMs), 自然语言处理技术