Computer Engineering and Applications ›› 2021, Vol. 57 ›› Issue (15): 42-61.DOI: 10.3778/j.issn.1002-8331.2103-0278

Previous Articles     Next Articles

Survey for Uyghur Morphological Analysis

LIU Chang, Abudukelimu·Abulizi, YAO Dengfeng, Halidanmu·Abudukelimu   

  1. 1.Department of Information Management, Xinjiang University of Finance and Economics, Urumqi 830012, China
    2.Institute of Silk Road Economy and Management, Xinjiang University of Finance and Economics, Urumqi 830012, China
    3.Beijing Key Laboratory of Information Service Engineering, Beijing Union University, Beijing 100101, China
  • Online:2021-08-01 Published:2021-07-26

维吾尔语形态分析研究综述

刘畅,阿布都克力木·阿布力孜,姚登峰,哈里旦木·阿布都克里木   

  1. 1.新疆财经大学 信息管理学院,乌鲁木齐 830012
    2.新疆财经大学 丝路经济与管理研究院,乌鲁木齐 830012
    3.北京联合大学 北京市信息服务工程重点实验室,北京 100101

Abstract:

Uyghur has the characteristics of morphological richness, agglutinative and data sparsity. There is a big gap between Uyghur and popular languages such as English and Chinese in processing technologies, which cannot meet the development needs of Xinjiang. Morphological analysis is an important part of natural language processing, and the study of Uyghur morphological analysis is significant to promote the development of Uyghur language information processing technology. This paper introduces Uyghur grammar, describes the research status of Uyghur natural language processing, morphological analysis and their related basic resources, divides common methods into five categories:rule-based, dictionary-based, statistics-based, deep learning-based and hybrid-based, analyzes the advantages and disadvantages of each method, introduces the follow-up research of Uyghur morphological analysis, draws lessons from the advanced lexical analysis methods, finally summarizes the challenges and opportunities faced by Uyghur morphological analysis, and looks forward to its future development trend.

Key words: Uyghur, natural language processing, morphological analysis, phonetic restoration, stemming, morphological segmentation

摘要:

维吾尔语具有形态丰富性、黏着性和数据稀疏性等特点,处理技术和英汉等热门语言有着较大差距并且未能满足新疆地区发展需求。形态分析是自然语言处理的重要组成部分,研究维吾尔语形态分析对于推动维吾尔语信息处理技术发展有着重要意义。简述了维吾尔语语法,描述了维吾尔语自然语言处理、形态分析及其相关基本资源研究现状,将常见方法分为基于规则、基于词典、基于统计、基于深度学习和基于混合5大类并分析了各种方法的优劣,介绍了维吾尔语形态分析后续研究,借鉴了先进的词法分析方法,总结了维吾尔语形态分析面临的挑战和机遇,并对其未来发展趋势进行展望。

关键词: 维吾尔语, 自然语言处理, 形态分析, 音变还原, 词干提取, 形态切分