Research on technologies of HTK based Uyghur continuous phoneme recognition

Abstract

Abstract: In this paper, HTK （Hidden Markov model-based Toolkit） based Uyghur continuous phoneme recognition baseline system is presented, and its several language-depended key technologies are addressed. According to the characteristics of Uyghur language, it designs the text corpus for language modeling and speech corpus construction, and records a large-scale speech data for training the phoneme based Uyghur acoustic model. The different recognition rates with different N-gram language models are also given. The statistics of the recognition rates of 32 Uyghur phonemes, the list of the confused phonemes and their possible reasons are analyzed. And then it gives some research directions for further improvements to the baseline system.

Key words: Uyghur language, acoustic model, language model, Uyghur phoneme, Hidden Markov model-based Toolkit（HTK）

摘要： 以建立维吾尔语连续音素识别基础平台为目标，在HTK（基于隐马尔可夫模型的工具箱）的基础上，首次研究了其语言相关环节的几项关键技术；结合维吾尔语的语言特征，完成了用于语言模型建立和语音语料库建设的维吾尔语基础文本设计；根据具体技术指标，录制了较大规模语音语料库；确定音素作为基元，训练了维吾尔语声学模型；在基于字母的N-gram语言模型下，得出了从语音句子向字母序列句子的识别结果；统计了维吾尔语32个音素的识别率，给出了容易混淆的音素及其根源分析，为进一步提高识别率奠定了基础。

关键词: 维吾尔语, 声学模型, 语言模型, 维吾尔语音素, 基于隐马尔可夫模型的工具箱（HTK）

Mirigul ABDURSUL, Mijit ABLIMIT, Akbar PATTAR, Askar HAMDULLA. Research on technologies of HTK based Uyghur continuous phoneme recognition[J]. Computer Engineering and Applications, 2013, 49(22): 150-154.

米日古力·阿布都热素，米吉提·阿不力米提，艾克白尔·帕塔尔，艾斯卡尔·艾木都拉. 基于HTK的维吾尔语连续音素识别技术研究[J]. 计算机工程与应用, 2013, 49(22): 150-154.

[1]	ZHANG Xiaofeng, XIE Jun, LUO Jianxin, YANG Tao. Overview of Deep Learning Speech Synthesis Technology [J]. Computer Engineering and Applications, 2021, 57(9): 50-59.
[2]	YAO Guibin, ZHANG Qigui. Chinese Named Entity Recognition Based on XLnet Language Model [J]. Computer Engineering and Applications, 2021, 57(18): 156-162.
[3]	YU Tongrui, JIN Ran, HAN Xiaozhen, LI Jiahui, YU Ting. Review of Pre-training Models for Natural Language Processing [J]. Computer Engineering and Applications, 2020, 56(23): 12-22.
[4]	CAI Wenbin1, WEI Yunlong1, XU Haihua2, PAN Lin1. Hybrid unit seletion speech synthesis system target cost construction [J]. Computer Engineering and Applications, 2018, 54(24): 20-25.
[5]	Guljamal Mamateli1, Askar rozi2, Askar Hamdulla3. Uyghur prosodic boundary prediction based on hierarchical feature template selection [J]. Computer Engineering and Applications, 2017, 53(8): 250-253.
[6]	WANG Haikun, WU Dayong, LIU Jiang, WANG Shijin, HU Guoping, HU Yu. Automatic speech recognition based on time domain modeling [J]. Computer Engineering and Applications, 2017, 53(20): 243-248.
[7]	Dawel Abilhayer, Nurmemet Yolwas, LIU Yan. On language model construction for LVCSR in Kazakh [J]. Computer Engineering and Applications, 2016, 52(24): 178-181.
[8]	FANG Gang1, ZHANG Shemin2. 3-gram statistical language model optimization to expression vector design [J]. Computer Engineering and Applications, 2016, 52(15): 60-64.
[9]	Alimjan AYSA1，3, Kurban UBUL2，3, Turgun IBRAHIM2，3. Bigram feature extraction for Uyghur text [J]. Computer Engineering and Applications, 2015, 51(3): 216-221.
[10]	WANG Xiuzhen, CONG Rui, WANG Fei. Novel spelling correction algorithm for online query [J]. Computer Engineering and Applications, 2015, 51(14): 113-119.
[11]	Azragul1，2, LI Xiao1, Yusup ABAYDULLA2. Research of modern Uyghur language statistical analysis technology [J]. Computer Engineering and Applications, 2014, 50(3): 108-111.
[12]	BAO Xirimo1, GAO Guanglai1, ZHANG Jing2. Genetic algorithm based optimization of acoustic model topologies [J]. Computer Engineering and Applications, 2014, 50(14): 5-8.
[13]	Nurmemet YOLWAS, Wushour SILAMU. Research on large vocabulary continuous speech recognition for Uyghur [J]. Computer Engineering and Applications, 2013, 49(9): 115-119.
[14]	LI Xiuying, DUAN Xiaoyi, WANG Jianxin. Audio watermarking scheme with self-synchronization based on psychoacoustic model [J]. Computer Engineering and Applications, 2013, 49(8): 96-99.
[15]	BAO Xirimo1, GAO Guanglai1, ZHANG Jing2. Construction of concise speech recognition systems based on BIC and PSO [J]. Computer Engineering and Applications, 2013, 49(10): 14-17.

Research on technologies of HTK based Uyghur continuous phoneme recognition

基于HTK的维吾尔语连续音素识别技术研究

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics