DTZH1505:Large Scale Open Source Mandarin Speech Corpus
WANG Dong, WANG Liyuan, WANG Daliang, QI Hongwei
1.College of Information Engineering, Xizang Minzu University, Xianyang, Shaanxi 712082, China
2.Datatang (Beijing) Technology Co., Ltd., Beijing 100192, China
WANG Dong, WANG Liyuan, WANG Daliang, QI Hongwei. DTZH1505:Large Scale Open Source Mandarin Speech Corpus[J]. Computer Engineering and Applications, 2022, 58(11): 295-301.
[1] LI B,ZHANG Y,SAINATH T,et al.Bytes are all you need:end-to-end multilingual speech recognition and synthesis with bytes[C]//ICASSP 2019-2019 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP),Brighton,United Kingdom,May 12-17,2019:5621-5625.
[2] GUO J X,SAINATH T N,WEISS R J,et al.A spelling correction model for end-to-end speech recognition[C]//ICASSP 2019-2019 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP),Brighton,United Kingdom,May 12-17,2019:5651-5655.
[3] DONG L H,WANG F,XU B.Self-attention aligner:a latency-control end-to-end model for ASR using self-attention network and chunk-hopping[C]//ICASSP 2019-2019 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP),Brighton,United Kingdom,May 12-17,2019:5656-5660.
[4] WANG D,ZHANGX W.Thchs-30:a free Chinese speech corpus[J].arXiv:1512.01882,2015.
[5] WANG D,WU D L,ZHU X Y.TCMSD:a new Chinese continuous speech database[C]//International Conference on Chinese Computing(ICCC),2001.
[6] BU H,DU J Y,NA X Y,et al.AISHELL-1:an open-source mandarin speech corpus and a speech recognition baseline[C]//2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment(O-COCOSDA),2017:1-5.
[7] DU J Y,NA X Y,LIU X C,et al.AISHELL-2:transforming mandarin ASR research into industrial scale[J].arXiv:1808.10583,2018.
[8] PANAYOTOV V,CHEN G G,POVEY D,et al.Librispeech:an ASR corpus based on public domain audio books[C]//2015 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP),2015:5206-5210.
[9] ROUSSEAU A,DEL P.TED-LIUM:an automatic speech recognition dedicated corpus[C]//Proceedings of the Eight International Conference on Language Resources and Evaluation(LREC’12),2012.
[10] ROUSSEAU A,DEL P.Enhancing the TED-LIUM corpus with selected data for language modeling and more TED talks[C]//LREC,2014.
[11] HERNANDEZ F,NGUYEN V.TED-LIUM 3:twice as much data and corpus repartition for experiments on speaker adaptation[J].arXiv:1805.04699,2018.
[12]CHIBELUSHI C C,DERAVI F,MASON J S D.A review of speech-based bimodal recognition[J].IEEE Transactions on Multimedia,2002,4(1):23-37.
[13] SUN J,WANG Z,WANG X,et al.Construction of the lexicons for continuous acoustic model training[C]//Proceedings of the Improvement of Intelligence Computer Interface and Application,1995:116-121.
[14] 祖漪清.汉语连续语音数据库的语料设计[J].声学学报,1999(3):236-247.
ZU Y Q.The text design for continuous speech database of standard Chinese[J].Acta Acustica,1999(3):236-247.
[15] 权立宏.小型汉语口语语料库建设探讨[J].广东外语外贸大学学报,2017,28(4):69-74.
QUAN L H.A study of construction of small-sized Chinese spoken corpora[J].Journal of Guangdong University of Foreign Studies,2017,28(4):69-74.
[16] DANIEL P,GHOSHAL A K,BOULIANNE G,et al.The kaldi speech recognition toolkit[C]//IEEE 2011 Workshop on Automatic Speech Recognition and Understanding,2011.
[17] DEHAK N,KENNY P.Front?end factor analysis for speaker verification[J].IEEE Transactions on Audio,Speech,and Language Processing,2011,19(4):788-798.
[18] PEDDINTI V,POVEY D,KHUDANPUR S.A time delay neural network architecture for efficient modeling of long temporal contexts[C]//INTERSPEECH,2015.