Computer Engineering and Applications ›› 2023, Vol. 59 ›› Issue (22): 36-56.DOI: 10.3778/j.issn.1002-8331.2211-0358

• Research Hotspots and Reviews • Previous Articles     Next Articles

Survey of Open Source Natural Language Processing Tools

LIAO Chunlin, ZHANG Hongjun, LIAO Xianglin, CHENG Kai, LI Dashuo, WANG Hang   

  1. Institute of Command and Control Engineering, Army Engineering University of PLA, Nanjing 210007, China
  • Online:2023-11-15 Published:2023-11-15

开源自然语言处理工具综述

廖春林,张宏军,廖湘琳,程恺,李大硕,王航   

  1. 中国人民解放军陆军工程大学 指挥控制工程学院,南京 210007

Abstract: Natural language processing tools are functional integration components that realize various subtasks in the field of natural language processing, and provide effective support for text processing and text analysis. At present, there are many types of natural language processing tools, various tools have different levels of support for subtasks, and some tools are only suitable for some special text fields, which will cause confusion in the selection of tools. Firstly, according to the processing order, the subtasks supported by the tools are divided into auxiliary tasks, basic tasks and application tasks, and are introduced. 23 domestic and foreign natural language processing open source tools such as LTP, NLPIR and OpenNLP are selected, and the call methods and supported programming languages of these tools are compared to summarize the characteristics of various tools. Then, the implementation principles of various tool subtasks are divided into rule methods, statistical methods, neural network methods and combination methods for sorting and analysis, the shortcomings of current tools are discussed. Finally, the future development of natural language processing tools are prospected from the aspects of multimodal fusion, cognitive intelligence, model compression and efficient computing.

Key words: natural language processing tools, text processing, text analysis

摘要: 自然语言处理工具是实现自然语言处理领域各项子任务的功能集成构件,为文本处理和文本分析提供有效的支撑。当前自然语言处理工具种类较多,各种工具对子任务支持程度不同,同时某些工具只适用于一些特殊的文本领域,这些差异会对工具选用造成困扰。依据处理顺序将工具支持的子任务划分为辅助任务、基础任务以及应用任务并进行介绍,选取LTP、NLPIR、OpenNLP等23种国内外自然语言处理开源工具,对这些工具的调用方式、支持的程序语言等方面进行比较,总结各种工具特点。再将各种工具子任务的实现原理分为规则方法、统计方法、神经网络方法以及组合方法进行整理和分析,探讨当前工具存在的不足之处。最后从多模态融合、认知智能、模型压缩与高效计算等方面对自然语言处理工具未来的发展进行展望。

关键词: 自然语言处理工具, 文本处理, 文本分析