Computer Engineering and Applications ›› 2015, Vol. 51 ›› Issue (4): 222-225.

Previous Articles     Next Articles

Speaking rate differences based chief speakers detection in press conferences recordings

WU Wei, LI Yanxiong, WANG Zili, CHEN Zhuyun   

  1. School of Electronic and Information Engineering, South China University of Technology, Guangzhou 510640, China
  • Online:2015-02-15 Published:2015-02-04

基于语速差异的新闻发布会中首要说话人检测

吴  伟,李艳雄,王梓里,陈祝允   

  1. 华南理工大学 电子与信息学院,广州 510640

Abstract: Chief speakers(e.g.politicians) generally impromptu answer prepared journalists in press conferences. Therefore, the speaking rate of chief speakers is slow while that of other speakers(e.g.journalists, interpreters) is quick. Based on the difference of speaking rate between the two kinds of speakers, a sliding window is used to extract voice from continuous audio stream, so that speaking rates of each window are estimated to obtain a rate curve where local minima can be found to determine change points. Finally, speech segments who satisfy speaking rates lower than a rate threshold and between two adjacent change points are distinguished as chief speakers’ voice. The experimental results show that the proposed method gets better performance in comparison with traditional methods.

Key words: press conferences recordings, speaking rate differences, chief speakers, speaker segmentation, speaker clustering

摘要: 新闻发布会中,首要说话人(例如政府要员)通常要即兴回答记者事先准备好的问题。因而首要说话人语速一般很慢,而其他说话人(例如记者、翻译等)语速则相对较快。基于两者的语速差异,采用一个滑动窗从连续语音流中截取语音段,再估计各音段语速得到一条语速曲线,然后寻找语速曲线中的局部最小值进而得到两类说话人的改变点,最后将语速低于门限且在两相邻改变点之间的语音段判为首要说话人语音,从而实现首要说话人检测。实验结果表明,与传统方法相比,基于语速差异的方法获得了更好的性能。

关键词: 新闻发布会语音, 语速差异, 首要说话人, 说话人分割, 说话人聚类