Computer Engineering and Applications ›› 2012, Vol. 48 ›› Issue (11): 71-74.

Previous Articles     Next Articles

Research and design of parallel speech recognition system

WANG Shuo, LIU Wen   

  1. IBM China Research Laboratory, Beijing 100083, China
  • Online:2012-04-11 Published:2012-04-16

并行化语音识别系统的研究与设计

王  硕,刘  文   

  1. IBM中国研究院,北京 100083

Abstract: How to handle large voice data is an important problem in speech recognition applications. It uses parallel?computing?to replace the traditional?standalone?process, if the parallel scheduling control is not good, the final result will be an error and if data segmentation is unreasonable, the data will lose semantic consistency leading to decline accuracy. Pieces of the file on the network transmission costs also need to consider. To solve above problems, it proposes a speech recognition system based on Hadoop, uses HDFS and MapReduce to solve pieces of the file transfer and control parallel scheduling and uses silence detection to handle file split. Through the experiment, it proves the effectiveness of this system.

Key words: speech recognition, parallel computing, Hadoop, MapReduce, silence detection

摘要: 如何处理海量语音数据是语音识别应用的一个重要问题,采用并行化计算取代传统的单机处理,如果并行调度控制不当,最终合并的结果在合并顺序上就会出现错误,并且数据切分不合理还会造成语义连贯性的丢失导致准确率的降低,文件片段在网络上传输的时间开销也需要考虑,针对上述问题,提出了一种基于Hadoop的语音识别系统,借助其分布式文件系统HDFS与MapReduce并行算法解决文件片段传输与并行调度控制的问题,同时引入静音检测算法合理地处理文件切分,通过实验验证了该系统的有效性。

关键词: 语音识别, 并行计算, Hadoop, MapReduce, 静音检测