藏语连续语音语料库设计与实现

doi:10.3778/j.issn.1002-8331.2010.13.069

计算机工程与应用 ›› 2010, Vol. 46 ›› Issue (13): 233-235.DOI: 10.3778/j.issn.1002-8331.2010.13.069

藏语连续语音语料库设计与实现

李永宏¹，于洪志¹，孔江平²

1.西北民族大学中国民族语言文字信息技术重点实验室，兰州 730030
2.北京大学中文系汉语语言学研究中心，北京 100871

收稿日期:2008-10-30 修回日期:2008-12-04 出版日期:2010-05-01 发布日期:2010-05-01
通讯作者: 李永宏

Design and implementation of Tibetan continuous speech corpus

LI Yong-hong¹，YU Hong-zhi¹，KONG Jiang-ping²

1.Key Lab of China’s National Linguistic Information Technology，Northwest University for Nationalities，Lanzhou 730030，China
2.Department of Chinese Language and Literature，Peking University，Beijing 100871，China

Received:2008-10-30 Revised:2008-12-04 Online:2010-05-01 Published:2010-05-01
Contact: LI Yong-hong

摘要/Abstract

摘要： 以藏语夏河话为研究对象，建立了基于三音子的藏语连续语音语料库。首先收集了10万句藏语文本语料库，并根据夏河话的实际发音，进行了国际音标转写；然后总结了夏河话的三音子音联结构形式，并用藏语文本处理平台对其组合类型和在原始文本语料库中的频度进行了详细的统计分析；最后在语音库的语料设计中综合考虑了三音子以及类三音子的覆盖率和稀疏度，设计并完成了语料抽取算法，实现了语料自动选取。

关键词: 藏语, 三音子, 语音库, Greed算法

Abstract: By taking Tibetan Xiahe dialect as the research object，continuous speech corpus based on triphone is built.At first，text corpus with 100 thousand sentences is collected and they are transformed to IPA according to pronunciation of Xiahe dialect，and then structure of triphone juncture is summarized and combination type and frequency of triphone in Corpus are statistically analyzed with text-processing platform in detail.At last by comprehensively considering coverage rate and sparseness of triphone and class-triphone the algorithm for extraction of corpus is designed and automatic selection to corpus is realized.

Key words: Tibetan, triphone, speech corpus, Greed algorithm

中图分类号:

TN912.34

李永宏¹，于洪志¹，孔江平². 藏语连续语音语料库设计与实现[J]. 计算机工程与应用, 2010, 46(13): 233-235.

LI Yong-hong¹，YU Hong-zhi¹，KONG Jiang-ping². Design and implementation of Tibetan continuous speech corpus[J]. Computer Engineering and Applications, 2010, 46(13): 233-235.

[1]	赵悦，李要嫱，徐晓娜，吴立成. 临近最优主动学习的藏语语音识别方法研究[J]. 计算机工程与应用, 2018, 54(22): 156-159.
[2]	黄晓辉1，2，李京1，马睿2，3. 藏语口语语音语料库的设计与研究[J]. 计算机工程与应用, 2018, 54(13): 231-235.
[3]	徐世鹏，杨鸿武，王海燕. 面向藏语语音合成的语音基元自动标注方法[J]. 计算机工程与应用, 2015, 51(6): 199-203.
[4]	何向真1，万福成1，于洪志1，吴玺宏2. 基于藏语语义分析的机器翻译技术研究[J]. 计算机工程与应用, 2015, 51(15): 134-137.
[5]	万福成1，于洪志1，吴玺宏2，何向真1. 面向机器翻译的藏语短语句法研究[J]. 计算机工程与应用, 2015, 51(13): 211-215.
[6]	于洪志，夏建华，万福成，陈新一. 基于藏语句多特征融合的主观题自动评分算法[J]. 计算机工程与应用, 2014, 50(5): 216-220.
[7]	才让加. 藏语语料库加工方法研究[J]. 计算机工程与应用, 2011, 47(6): 138-139.
[8]	刘博1，杨鸿武1，甘振业1，2，郭威彤1. 利用SAMPA实现藏语的字音转换[J]. 计算机工程与应用, 2011, 47(35): 117-121.

藏语连续语音语料库设计与实现

Design and implementation of Tibetan continuous speech corpus

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 8

编辑推荐

Metrics