On language model construction for LVCSR in Kazakh

Computer Engineering and Applications ›› 2016, Vol. 52 ›› Issue (24): 178-181.

Previous Articles Next Articles

On language model construction for LVCSR in Kazakh

Dawel Abilhayer, Nurmemet Yolwas, LIU Yan

College of Information Science and Engineering, Xinjiang University, Urumqi 830046, China

Online:2016-12-15 Published:2016-12-20

面向哈萨克语LVCSR的语言模型构建方法研究

达吾勒·阿布都哈依尔，努尔买买提·尤鲁瓦斯，刘艳

新疆大学信息科学与工程学院，乌鲁木齐 830046

Abstract

Abstract: A good language model not only compresses the search space for speech recognition process, but also improves the recognition accuracy. N-gram statistical language model is one of the widely used language models. This paper starts from the collection and processing of the text, introduces the construction technology of Kazakh language model. On?this?basis?a Kazakh continuous speech recognition baseline system?is?implemented. It trains the 3-gram language model based on word and syllable respectively, and then the two language models are evaluated by the result of perplexity and continuous language experiment.

Key words: Kazakh language, language model, Automatic Speech Recognition（ASR）, corpus creation, text processing

摘要： 一个好的语言模型不仅可以压缩语音识别过程中的搜索空间，而且还可以提高识别准确率。N-gram统计语言模型是目前广泛使用的语言模型之一。从文本的收集和处理开始，介绍了哈萨克语语言模型的构建相关技术，并以此为基础实现了一个哈萨克语连续语音识别基线系统。分别训练了基于单词和基于音节的3-gram语言模型，并通过困惑度及连续语言实验结果对两种语言模型进行了评价。

关键词: 哈萨克语, 语言模型, 语音识别, 语料库构建, 文本处理

Dawel Abilhayer, Nurmemet Yolwas, LIU Yan. On language model construction for LVCSR in Kazakh[J]. Computer Engineering and Applications, 2016, 52(24): 178-181.

达吾勒·阿布都哈依尔，努尔买买提·尤鲁瓦斯，刘艳. 面向哈萨克语LVCSR的语言模型构建方法研究[J]. 计算机工程与应用, 2016, 52(24): 178-181.

[1]	YAO Guibin, ZHANG Qigui. Chinese Named Entity Recognition Based on XLnet Language Model [J]. Computer Engineering and Applications, 2021, 57(18): 156-162.
[2]	YU Tongrui, JIN Ran, HAN Xiaozhen, LI Jiahui, YU Ting. Review of Pre-training Models for Natural Language Processing [J]. Computer Engineering and Applications, 2020, 56(23): 12-22.
[3]	FANG Gang1, ZHANG Shemin2. 3-gram statistical language model optimization to expression vector design [J]. Computer Engineering and Applications, 2016, 52(15): 60-64.
[4]	WANG Xiuzhen, CONG Rui, WANG Fei. Novel spelling correction algorithm for online query [J]. Computer Engineering and Applications, 2015, 51(14): 113-119.
[5]	Mirigul ABDURSUL, Mijit ABLIMIT, Akbar PATTAR, Askar HAMDULLA. Research on technologies of HTK based Uyghur continuous phoneme recognition [J]. Computer Engineering and Applications, 2013, 49(22): 150-154.
[6]	TIAN Shengwei1，YU Long2，WANG Yuguang1. Research on sentiment classification of Uighur reviews [J]. Computer Engineering and Applications, 2011, 47(36): 147-150.
[7]	YANG Chunfeng1，WANG Huanliang1，2. Decoding method integrating of confusion network based on Trigger language model [J]. Computer Engineering and Applications, 2011, 47(10): 127-130.
[8]	LI Wen-bin，CHEN Yi-ying，ZHANG Juan，ZHANG Xin-dong. Using Fisher linear discriminant analysis to extracting classifiers [J]. Computer Engineering and Applications, 2010, 46(14): 132-134.
[9]	KANG Jun-jian^1,2,DU Zai-lin³,ZHANG Xin-dong¹,ZHU Qun-ying¹. Using information gain method to select classifiers [J]. Computer Engineering and Applications, 2009, 45(14): 158-160.
[10]	LI Tian-xia,DAI Xin-yu,CHEN Jia-jun. Hybrid model for overlapping ambiguities resolution [J]. Computer Engineering and Applications, 2008, 44(21): 5-8.
[11]	,,. A Comparative Study on Smoothing Algorithms for Domain-Specific Chinese Language Models [J]. Computer Engineering and Applications, 2006, 42(32期): 0-.

On language model construction for LVCSR in Kazakh

面向哈萨克语LVCSR的语言模型构建方法研究

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 11

Recommended Articles

Metrics