计算机工程与应用 ›› 2017, Vol. 53 ›› Issue (3): 149-153.DOI: 10.3778/j.issn.1002-8331.1504-0255

• 模式识别与人工智能 • 上一篇    下一篇

说话人聚类的初始类生成方法

赖松轩,李艳雄   

  1. 华南理工大学 电子与信息学院,广州 510640
  • 出版日期:2017-02-01 发布日期:2017-05-11

 Generating initial clusters for speaker clustering

LAI Songxuan, LI Yanxiong   

  1. School of Electronic and Information Engineering, South China University of Technology, Guangzhou 510640, China
  • Online:2017-02-01 Published:2017-05-11

摘要: 目前说话人聚类时将说话人分割后的语音段作为初始类,直接对这些数量庞大语音段进行聚类的计算量非常大。为了降低说话人聚类时的计算量,提出一种面向说话人聚类的初始类生成方法。提取说话人分割后语音段的特征参数及特征参数的质心,结合层次聚类法和贝叶斯信息准则,对语音段进行具有宽松停止准则的“预聚类”,生成初始类。与直接对说话人分割后的语音段进行聚类的方法相比,该方法能在保持原有聚类性能的情况下,减少40.04%的计算时间;在允许聚类性能略有下降的情形下,减少60.03%以上的计算时间。

关键词: 层次聚类, 贝叶斯信息准则, 说话人聚类, 初始类, 语音信号处理

Abstract:  During the procedure of state-of-art speaker clustering, the individual speech segment directly obtained from speaker segmentation is used as an initial cluster, which leads to huge amount of calculation. In this paper, an algorithm of generating initial clusters for speaker clustering is thus proposed in order to reduce calculation load. First, features are extracted from speech segments, and centroids of features are calculated. Then the initial clusters are generated by clustering these centroids using both hierarchical clustering algorithm and Bayesian information criterion under an easy stopping criterion. Experiments show that doing speaker clustering on the initial clusters generated by the proposed method is faster than doing speaker clustering on the speech segments directly obtained by speaker segmentation. The computational reduction is about 40.04% without losing the performance of speaker cluster, and the computational reduction is more than 60.03% with losing little performance of speaker cluster.

Key words: hierarchical clustering, Bayesian information criterion, speaker clustering, initial clusters, speech signal processing