Design and research of Tibetan spoken speech corpus

doi:10.3778/j.issn.1002-8331.1702-0269

Computer Engineering and Applications ›› 2018, Vol. 54 ›› Issue (13): 231-235.DOI: 10.3778/j.issn.1002-8331.1702-0269

Previous Articles Next Articles

Design and research of Tibetan spoken speech corpus

HUANG Xiaohui1，2, LI Jing1, MA Rui2，3

1. School of Computer Science and Technology, University of Science and Technology of China, Hefei 230026, China
2. Department of Engineering, PLA University of Foreign Language, Luoyang, Henan 471003, China
3. Institute of Tibetology, Minzu University of China, Beijing 100081, China

Online:2018-07-01 Published:2018-07-17

藏语口语语音语料库的设计与研究

黄晓辉1，2，李京1，马睿2，3

1.中国科学技术大学计算机科学与技术学院，合肥 230026
2.解放军外国语学院工程系，河南洛阳 471003
3.中央民族大学藏学研究院，北京 100081

Abstract

Abstract: Based on the research and analysis of the construction method of traditional phonological corpus, combined with the related needs of natural spoken speech recognition and the characteristics of Tibetan natural spoken language, the construction scheme and annotation standard of spoken language corpus suitable for Tibetan speech recognition is designed. A 50-hour Tibetan Lhasa spoken corpus with five layers of annotation including phonemes, semitone, syllables, Tibetan word and sentences is also constructed. The statistic characteristics show that this corpus retains the natural properties of spoken language, and also has a balanced coverage of commonly used modeling units such as phonemes, semitone, so it is able to provide reliable data support for speech recognition technology based on Tibetan spoken speech data.

Key words: speech corpus, spoken speech, speech recognition, annotation standard, Tibetan Lhasa words

摘要： 基于对普通语音语料库构建方法的研究与分析，结合自然口语语音识别研究相关需求以及藏语自然口语语音的基本特点，研究设计了适用于藏语语音识别的口语语音语料库建设方案以及相应的标注规范，并据此构建了时长50小时，包含音素、半音节、音节、藏文字以及语句共5层标注信息的藏语拉萨话口语语音语料库。统计结果显示，该语料库在保留口语语音自然属性的同时，对音素、半音节等常用语音建模单元也有均衡的覆盖，为基于藏语口语语音数据的语音识别技术研究提供了可靠的数据支撑。

关键词: 语音语料库, 口语语音, 语音识别, 标注规范, 藏语拉萨话

HUANG Xiaohui1，2, LI Jing1, MA Rui2，3. Design and research of Tibetan spoken speech corpus[J]. Computer Engineering and Applications, 2018, 54(13): 231-235.

黄晓辉1，2，李京1，马睿2，3. 藏语口语语音语料库的设计与研究[J]. 计算机工程与应用, 2018, 54(13): 231-235.

[1]	LOU Yingdan, XU Jinglin, HUANG Lixia, ZHANG Xueying. Speech Recognition Based on MLLR and MAP Under Distant Noise Reverberation Environment [J]. Computer Engineering and Applications, 2020, 56(10): 122-126.
[2]	ZHAO Yue, LI Yaoqiang, XU Xiaona, WU Licheng. Near-optimal active learning for Tibetan speech recognition [J]. Computer Engineering and Applications, 2018, 54(22): 156-159.
[3]	SONG Chunxiao, SUN Ying. Nonlinear geometric feature extraction algorithm for emotional speech recognition [J]. Computer Engineering and Applications, 2017, 53(20): 128-133.
[4]	HUANG Lixia1, WANG Yanan1, ZHANG Xueying1, WANG Hongcui2. Research on noise robustness of speech recognition based on deep auto-encoder neural network [J]. Computer Engineering and Applications, 2017, 53(13): 49-54.
[5]	ZHAO Caiguang, ZHANG Shuqun, LEI Zhaoyi. Improved speech recognition of GRBM based on parallel tempering [J]. Computer Engineering and Applications, 2016, 52(8): 125-129.
[6]	Dawel Abilhayer, Nurmemet Yolwas, LIU Yan. On language model construction for LVCSR in Kazakh [J]. Computer Engineering and Applications, 2016, 52(24): 178-181.
[7]	CHAO Hao, SONG Cheng, XUE Xiao, LIU Zhizhong. Vocal effort related robust speech recognition based on adaptation method [J]. Computer Engineering and Applications, 2016, 52(2): 156-160.
[8]	CHAO Hao. Decoding algorithm of integrating phonetic string edit distance into stochastic segment models [J]. Computer Engineering and Applications, 2015, 51(6): 208-211.
[9]	WANG Lulu1, XIA Xu2, FENG Lu1, LIU Guangcan1. New speech endpoint detection algorithm based on spectrum variance and spectral subtraction [J]. Computer Engineering and Applications, 2014, 50(8): 194-197.
[10]	CHAO Hao, SONG Cheng, LIU Zhizhong. Integrating tone models into speech recognition system based on articulatory feature [J]. Computer Engineering and Applications, 2014, 50(23): 21-25.
[11]	BAO Xirimo1, GAO Guanglai1, ZHANG Jing2. Genetic algorithm based optimization of acoustic model topologies [J]. Computer Engineering and Applications, 2014, 50(14): 5-8.
[12]	Nurmemet YOLWAS, Wushour SILAMU. Research on large vocabulary continuous speech recognition for Uyghur [J]. Computer Engineering and Applications, 2013, 49(9): 115-119.
[13]	HE Yuanyuan1, ZHANG Xueying1, LIU Xiaofeng2. Multiclass classification pre-selection of SVM in speech recognition application [J]. Computer Engineering and Applications, 2013, 49(7): 115-118.
[14]	GUO Chao1, ZHANG Xueying1, LIU Xiaofeng2. Application of support vector machines in low SNR speech recognition [J]. Computer Engineering and Applications, 2013, 49(5): 213-215.
[15]	Nurmemet Yolwas, Wushour Silamu. Optimization of acoustic model for Uyghur continuous speech recognition [J]. Computer Engineering and Applications, 2013, 49(2): 145-147.

Design and research of Tibetan spoken speech corpus

藏语口语语音语料库的设计与研究

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics