Computer Engineering and Applications ›› 2023, Vol. 59 ›› Issue (11): 98-104.DOI: 10.3778/j.issn.1002-8331.2201-0087

• Pattern Recognition and Artificial Intelligence • Previous Articles     Next Articles

Meta-Learning Method of Uyghur Morphological Segmentation

ZHANG Yuning, LI Wenzhuo, Abudukelimu Halidanmu, Abulizi Abudukelimu   

  1. School of Information Management, Xinjiang University of Finance and Economics, Urumqi 830012, China
  • Online:2023-06-01 Published:2023-06-01

维吾尔语形态切分的元学习方法

张雨宁,李文卓,哈里旦木·阿布都克里木,阿布都克力木·阿布力孜   

  1. 新疆财经大学 信息管理学院,乌鲁木齐 830012

Abstract: With the development of deep learning, the accuracy of Uyghur morphological segmentation has been dramatically improved, but the demand for data volume is high, while meta-learning method can effectively alleviate the model’s reliance on data volume by learning from previous tasks, and is widely used in low-resource domains. Therefore, the meta-learning method of Uyghur morphological segmentation is proposed, which focuses on fast generalisation on new tasks by training on previous tasks and obtaining a set of parameters with the ability to quickly adapt to new tasks. The experiments are first constructed with N pseudo-meta-learning tasks based on the similarity of the data for the partitioning of meta-learning support sets and query sets. Afterwards, the Uyghur data is encoded using Transformer’s encoder. Finally, the meta-learning method is used to achieve morphological segmentation for Uyghur language in few shot environments. Experimental results show the meta-learning method outperforms the pre-trained model in the few shot task, effectively avoiding overfitting of the model and mitigating the impact of data sparsity on the model.

Key words: meta-learning, morphological segmentation, Uyghur

摘要: 随着深度学习的发展,维吾尔语形态切分的准确率得到了大幅提升,但对数据量的需求较高,而元学习方法通过对以往任务的学习,有效缓解了模型对数据量的依赖,在低资源领域应用广泛。因此提出维吾尔语形态切分的元学习方法,该方法主要通过对以往任务的训练,获得一组具有快速适应新任务能力的参数,从而在新任务上实现快速泛化。实验根据数据的相似度构建[N]个伪元学习任务,完成元学习支撑集和查询集的划分,使用Transformer的编码器对维吾尔语数据进行编码,采用元学习方法实现对少样本环境下的维吾尔语形态切分。实验结果表明,在维吾尔语形态切分的少样本任务中元学习方法优于预训练模型,有效避免了模型的过拟合,缓解了数据稀疏性对模型的影响。

关键词: 元学习, 形态切分, 维吾尔语