计算机工程与应用 ›› 2022, Vol. 58 ›› Issue (7): 43-54.DOI: 10.3778/j.issn.1002-8331.2108-0200
孙晓东,杨东强
出版日期:
2022-04-01
发布日期:
2022-04-01
SUN Xiaodong, YANG Dongqiang
Online:
2022-04-01
Published:
2022-04-01
摘要: 近年来,将语法错误纠正当作机器翻译任务在英语语法纠错领域取得重大进展,对于数据驱动的自然语言处理方法,大规模、高质量的标注语料成为翻译等相关任务最重要的资源。在调查中,主要关注英语语法纠错领域的数据集和数据增广方法。全面地概括了英语语法纠错领域使用的数据集、数据合成、评价方法及应用现状,并对其进行归纳分析;对今后如何提高英语语法纠错模型的性能进行了总结和展望。
孙晓东, 杨东强. 数据增广策略在英语语法纠错中的应用综述[J]. 计算机工程与应用, 2022, 58(7): 43-54.
SUN Xiaodong, YANG Dongqiang. Review of Application of Data Augmentation Strategy in English Grammar Error Correction[J]. Computer Engineering and Applications, 2022, 58(7): 43-54.
[1] SIDOROV G.Syntactic dependency based n-grams in rule based automatic English as second language grammar correction[J].International Journal of Computational Linguistics and Applications,2013,4(2):169-188. [2] CHOLLAMPATT S,NG H T.Connecting the dots:Towards human-level grammatical error correction[C]//Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications,2017:327-333. [3] 诸凯丽.基于分类模型的英语语法纠错算法研究[D].杭州:浙江大学,2019. ZHU K L.Research on classification model of English grammatical error correction[D].Hangzhou:Zhejiang University,2019. [4] BRYANT C,BRISCOE T.Language model based grammatical error correction without annotated training data[C]//Proceedings of the 13th Workshop on Innovative Use of NLP for Building Educational Applications,2018:247-253. [5] YUAN Z,BRISCOE T.Grammatical error correction using neural machine translation[C]//Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies,2016:380-386. [6] LOPEZ A.Statistical machine translation[J].ACM Computing Surveys(CSUR),2008,40(3):1-49. [7] KLEIN G,KIM Y,DENG Y,et al.OpenNMT:Open-source toolkit for neural machine translation[C]//Proceedings of ACL 2017,System Demonstrations,2017:67-72. [8] KOEHN P,KNOWLES R.Six challenges for neural machine translation[C]//Proceedings of the 1st Workshop on Neural Machine Translation,2017:28-39. [9] SENNRICH R,HADDOW B,BIRCH A J A.Improving neural machine translation models with monolingual data[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics,Association for Computational Linguistics(ACL),2016:86-96. [10] GRUNDKIEWICZ R,JUNCZYS-DOWMUNT M,HEAFIELD K.Neural grammatical error correction systems with unsupervised pre-training on synthetic data[C]//Proceedings of the 14th Workshop on Innovative Use of NLP for Building Educational Applications,2019:252-263. [11] JUNCZYS-DOWMUNT M,GRUNDKIEWICZ R.The AMU system in the CoNLL?2014 shared task:Grammatical error correction by data-intensive and feature-rich statistical machine translation[C]//Proceedings of the 18th Conference on Computational Natural Language Learning:Shared Task,2014:25-33. [12] KANEKO M,MITA M,KIYONO S,et al.Encoder-decoder models can benefit from pre-trained masked language models in grammatical error correction[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics,2020:4248-4254. [13] CHOLLAMPATT S,NG H T.Neural quality estimation of grammatical error correction[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing,2018:2528-2539. [14] FELICE M,YUAN Z,ANDERSEN ? E,et al.Grammatical error correction using hybrid systems and type filtering[C]//Proceedings of Association for Computational Linguistics.,2014:15-24. [15] PAN S J,YANG Q.A survey on transfer learning[J].IEEE Transactions on Knowledge and Data Engineering,2009,22(10):1345-1359. [16] 李斐.一种基于迁移学习的程序语法纠错模型[D].南京:南京大学,2020. LI F.A programming syntax error correction model based on transfer learning[D].Nanjing:Nanjing University,2020. [17] DALE R,KILGARRIFF A.Helping our own:The HOO 2011 pilot shared task[C]//Proceedings of the 13th European Workshop on Natural Language Generation,2011:242-249. [18] NAPOLES C,SAKAGUCHI K,POSt M,et al.Ground truth for grammaticality correction metrics[C]//Proceedings of Association for Computational Linguistics,2015. [19] MIZUMOTO T,KOMACHI M,NAGATA M,et al.Mining revision log of language learning SNS for automated Japanese error correction of second language learners[C]//Proceedings of 5th International Joint Conference on Natural Language Processing,2011:147-155. [20] DAHLMEIER D,NG H,WU S M.Building a large annotated corpus of learner English:The NUS corpus of learner English[C]//Proceedings of BEA@NAACL-HLT,2013. [21] BRYANT C,FELICE M,ANDERSEN ? E,et al.The BEA-2019 shared task on grammatical error correction[C]//Proceedings of the 14th Workshop on Innovative Use of NLP for Building Educational Applications,2019. [22] GRANGER S.The computer learner corpus:A versatile new source of data for SLA research[M].[S.l.]:Routledge,2014:3-18. [23] NAPOLES C,N?DEJDE M,TETREAULT J.Enabling robust grammatical error correction in new domains:Data sets,metrics,and analyses[J].Transactions of the Association for Computational Linguistics,2019,7:551-566. [24] DAUDARAVICIUS V,BANCHS R E,VOLODINA E,et al.A report on the automatic evaluation of scientific writing shared task[C]//Proceedings of the 11th Workshop on Innovative Use of NLP for Building Educational Applications,2016:53-62. [25] CHELBA C,MIKOLOV T,SCHUSTER M,et al.One billion word benchmark for measuring progress in statistical language modeling[C]//Proceedings of Computing Research Repository(CoRR),2014:1-6. [26] LAHIRI S.Complexity of word collocation networks:A preliminary structural analysis[C]//Proceedings of the Student Research Workshop at the 14th Conference of the European Chapter of the Association for Computational Linguistics,2014:96-105. [27] TIEDEMANN J.The Tatoeba translation challenge-realistic data sets for low resource and multilingual MT[C]//Proceedings of the 5th Conference on Machine Translation,2020:1174-1182. [28] MERITY S,XIONG C,BRADBURY J,et al.Pointer sentinel mixture models[J].arXiv:1609.07843,2016. [29] TAJIRI T,KOMACHI M,MATSUMOTO Y.Tense and aspect error correction for ESL learners using global context[C]//Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics(Short Papers),2012:198-202. [30] KOEHN P,HOANG H,BIRCH A,et al.Moses:Open source toolkit for statistical machine translation[C]//Proceedings of the 45th Annual Meeting of the ACL on interactive Poster and Demonstration Sessions,2007:177-180. [31] DAHLMEIER D,NG H.Better evaluation for grammatical error correction[C]//Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies,2012:568-572. [32] PAPINENI K,ROUKOS S,WARD T,et al.Bleu:A method for automatic evaluation of machine translation[C]//Proceedings of the 40th Annual Meeting on Association for Computational Linguistics,2002:311-318. [33] FELICE M,BRISCOE T.Towards a standard evaluation method for grammatical error detection and correction[C]//Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies,2015:578-587. [34] BRYANT C,FELICE M,BRISCOE E.Automatic annotation and evaluation of error types for grammatical error correction[C]//Proceedings of Association for Computational Linguistics,2017:793-805. [35] FELICE M,BRYANT C,BRISCOE E.Automatic extraction of learner errors in ESL sentences using linguistically enhanced alignments[C]//Proceedings of Association for Computational Linguistics,2016:825-835. [36] KIYONO S,SUZUKI J,MITA M,et al.An empirical study of incorporating pseudo data into grammatical error correction[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing(EMNLP-IJCNLP),2019:1236-1242. [37] HTUT P M,TETREAULT J.The Unbearable weight of generating artificial errors for grammatical error correction[C]//Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications,2019:478-483. [38] ZHAO W,WANG L,SHEN K,et al.Improving grammatical error correction via pre-training a copy-augmented architecture with unlabeled data[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies,2019:156-165. [39] BROCKETT C,DOLAN W B,GAMON M.Correcting ESL errors using phrasal SMT techniques[C]//Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics,2006:249-256. [40] EHSAN N,FAILI H.Grammatical and context-sensitive error correction using a statistical machine translation framework[J].Software:Practice and Experience,2013,43(2):187-206. [41] CHOE Y J,HAM J,PARK K,et al.A neural grammatical error correction system built on better pre-training and sequential transfer learning[C]//Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications,2019:213-227. [42] YUAN Z,FELICE M.Constrained grammatical error correction using statistical machine translation[C]//Proceedings of the Seventeenth Conference on Computational Natural Language Learning:Shared Task,2013:52-61. [43] XIE Z,GENTHIAL G,XIE S,et al.Noising and denoising natural language:Diverse backtranslation for grammar correction[C]//Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies(Long Papers),2018:619-628. [44] EDUNOV S,OTT M,AULI M,et al.Understanding back-translation at scale[C]//Proceedings of the 32nd AAAI Conference on Artificial Intelligence,2018:5755-5762. [45] ZHOU W,GE T,MU C,et al.Improving grammatical error correction with machine translation pairs[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing,2019:318-328. [46] LICHTARGE J,ALBERTI C,KUMAR S,et al.Corpora generation for grammatical error correction[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies(Long and Short Papers),2019:3291-3301. [47] FELICE M,YUAN Z.Generating artificial errors for grammatical error correction[C]//Proceedings of the Student Research Workshop at the 14th Conference of the European Chapter of the Association for Computational Linguistics 2014:116-126. [48] STAHLBERG F,KUMAR S.Synthetic data generation for grammatical error correction with tagged corruption models[C]//Proceedings of the 16th Workshop on Innovative Use of NLP for Building Educational Applications,2021:37-47. [49] SJ?BLOM E,CREUTZ M,VAHTOLA T.Grammatical error generation based on translated fragments[C]//Proceedings of the 23rd Nordic Conference on Computational Linguistics(NoDaLiDa),2021:398-403. [50] ROTHE S,MALLINSON J,MALMI E,et al.A simple recipe for multilingual grammatical error correction[J].arXiv:2106.03830,2021. [51] AWASTHI A,SARAWAGI S,GOYAL R,et al.Parallel iterative edit models for local sequence transduction[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing(EMNLP-IJCNLP),2019:4260-4270. [52] WAN Z,WAN X,WANG W.Improving grammatical error correction with data augmentation by editing latent representation[C]//Proceedings of the 28th International Conference on Computational Linguistics,2020:2202-2212. [53] TAKAHASHI Y,KATSUMATA S,KOMACHI M.Grammatical error correction using pseudo learner corpus considering learner’s error tendency[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics:Student Research Workshop,2020:27-32. [54] WANG L,ZHENG X.Improving grammatical error correction models with purpose-built adversarial examples[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing(EMNLP),2020:2858-2869. [55] LI Y,ANASTASOPOULOS A,BLACK A W.Towards minimal supervision BERT-based grammar error correction[J].arXiv:2001.03521,2020. [56] XU S,ZHANG J,CHEN J,et al.Erroneous data generation for grammatical error correction[C]//Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications 2019:149-158. [57] KANTOR Y,KATZ Y,CHOSHEN L,et al.Learning to combine grammatical error corrections[C]//Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications,2019:139-148. [58] SHULHAN I,MURTAZINA A.Grammatical error correction:Transformer model for training data generation[Z]. [59] NáPLAVA J,STRAKA M.Grammatical error correction in low-resource scenarios[C]//Proceedings of the 5th Workshop on Noisy User-Generated Text(W-NUT 2019),2019:346-356. [60] YUAN Z,STAHLBERG F,REI M,et al.Neural and FST-based approaches to grammatical error correction[C]//Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications,2019:228-239. [61] FLACHS S,LACROIX O,S?GAARD A.Noisy channel for low resource grammatical error correction[C]//Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications,2019:191-196. [62] GE T,WEI F,ZHOU M.Fluency boost learning and inference for neural grammatical error correction[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics(Long Papers),2018:1055-1065. [63] LICHTARGE J,ALBERTI C,KUMAR S,et al.Weakly supervised grammatical error correction using iterative decoding[J].arXiv:1811.01710,2018. [64] KASEWA S,STENETORP P,RIEDEL S.Wronging a right:Generating better errors to improve grammatical error detection[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing,2018:4977-4983. [65] ROZOVSKAYA A,ROTH D.Grammatical error correction:Machine translation and classifiers[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics,2016:2205-2215. [66] DE FELICE R,PULMAN S G.A classifier-based approach to preposition and determiner error correction in L2 English[C]//Proceedings of the 22nd International Conference on Computational Linguistics,2008:169-176. [67] GAMON M,GAO J,BROCKETT C,et al.Using contextual speller techniques and language modeling for ESL error correction[C]//Proceedings of IJCNLP 2008,2008. [68] VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need[C]//Advances in Neural Information Processing Systems,2017:5998-6008. [69] GRUNDKIEWICZ R,JUNCZYS-DOWMUNT M.Minimally-augmented grammatical error correction[C]//Proceedings of the 5th Workshop on Noisy User-Generated Text(W-NUT 2019),2019:357-363. |
[1] | 张隅希,段宗涛,朱依水,王路阳,周祎,郭宇. 机动车油耗模型研究综述[J]. 计算机工程与应用, 2021, 57(24): 14-26. |
[2] | 徐琛,董德存,欧冬秀. 传感网中数据驱动的多时段控制方法优化研究[J]. 计算机工程与应用, 2020, 56(15): 235-241. |
[3] | 买雪洁,石杰元,童倩倩. 基于周期性校正神经网络的血流血管壁耦合[J]. 计算机工程与应用, 2019, 55(24): 178-183. |
[4] | 曾春艳,叶佳翔,王志锋,武明虎. 深度学习框架下压缩感知重建算法综述[J]. 计算机工程与应用, 2019, 55(17): 1-8. |
[5] | 易成岐1,黄倩倩1,王从余2,张何灿3,靳晓锟4,王建冬1. 面向类不平衡问题的“职业举报人”识别方法[J]. 计算机工程与应用, 2019, 55(14): 1-7. |
[6] | 林园园,战洪飞,余军合,张桂海. 数据驱动的产品概念设计知识服务模型构建[J]. 计算机工程与应用, 2018, 54(16): 211-219. |
[7] | 王家海,陈 煜. 数据驱动的Job Shop生产调度知识挖掘及优化[J]. 计算机工程与应用, 2018, 54(1): 264-270. |
[8] | 尹海明,王金明,李欢欢. 基于数据驱动缺失特征检测与重建的声纹识别[J]. 计算机工程与应用, 2016, 52(22): 159-163. |
[9] | 江志雄1,2,金 海1,黄晓庆2. 元数据驱动的BI-PAAS的设计与实现[J]. 计算机工程与应用, 2011, 47(36): 91-95. |
[10] | 吴恒山 姜文君. GUI软件自动测试探索[J]. 计算机工程与应用, 2007, 43(3期): 81-81. |
[11] | 袁满 郭宝祥 孙永东. 元数据驱动的个性化查询工具设计与实现[J]. 计算机工程与应用, 2007, 43(11): 185-187. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||