说话人聚类的初始类生成方法

doi:10.3778/j.issn.1002-8331.1504-0255

计算机工程与应用 ›› 2017, Vol. 53 ›› Issue (3): 149-153.DOI: 10.3778/j.issn.1002-8331.1504-0255

说话人聚类的初始类生成方法

赖松轩，李艳雄

华南理工大学电子与信息学院，广州 510640

出版日期:2017-02-01 发布日期:2017-05-11

Generating initial clusters for speaker clustering

LAI Songxuan, LI Yanxiong

School of Electronic and Information Engineering, South China University of Technology, Guangzhou 510640, China

Online:2017-02-01 Published:2017-05-11

摘要/Abstract

摘要： 目前说话人聚类时将说话人分割后的语音段作为初始类，直接对这些数量庞大语音段进行聚类的计算量非常大。为了降低说话人聚类时的计算量，提出一种面向说话人聚类的初始类生成方法。提取说话人分割后语音段的特征参数及特征参数的质心，结合层次聚类法和贝叶斯信息准则，对语音段进行具有宽松停止准则的“预聚类”，生成初始类。与直接对说话人分割后的语音段进行聚类的方法相比，该方法能在保持原有聚类性能的情况下，减少40.04%的计算时间；在允许聚类性能略有下降的情形下，减少60.03%以上的计算时间。

关键词: 层次聚类, 贝叶斯信息准则, 说话人聚类, 初始类, 语音信号处理

Abstract: During the procedure of state-of-art speaker clustering, the individual speech segment directly obtained from speaker segmentation is used as an initial cluster, which leads to huge amount of calculation. In this paper, an algorithm of generating initial clusters for speaker clustering is thus proposed in order to reduce calculation load. First, features are extracted from speech segments, and centroids of features are calculated. Then the initial clusters are generated by clustering these centroids using both hierarchical clustering algorithm and Bayesian information criterion under an easy stopping criterion. Experiments show that doing speaker clustering on the initial clusters generated by the proposed method is faster than doing speaker clustering on the speech segments directly obtained by speaker segmentation. The computational reduction is about 40.04% without losing the performance of speaker cluster, and the computational reduction is more than 60.03% with losing little performance of speaker cluster.

Key words: hierarchical clustering, Bayesian information criterion, speaker clustering, initial clusters, speech signal processing

赖松轩，李艳雄. 说话人聚类的初始类生成方法[J]. 计算机工程与应用, 2017, 53(3): 149-153.

LAI Songxuan, LI Yanxiong. Generating initial clusters for speaker clustering[J]. Computer Engineering and Applications, 2017, 53(3): 149-153.

[1]	王俊玲，卢新明. 基于语义相关的视频关键帧提取算法[J]. 计算机工程与应用, 2021, 57(4): 192-198.
[2]	洪征，龚启缘，冯文博，李毅豪. 自适应聚类的未知应用层协议识别方法[J]. 计算机工程与应用, 2020, 56(5): 109-117.
[3]	王熙月1，黄毅鹏1，钱佳慧1，何凌1，黄华1，尹恒2. 基于声学特征的腭裂语音声韵母切分[J]. 计算机工程与应用, 2018, 54(8): 123-130.
[4]	宋冬云，郑瑾，张祖平. 基于混合策略的中文短文本相似度计算[J]. 计算机工程与应用, 2018, 54(12): 116-120.
[5]	王海涌，冯兆旭，杨海波，张津栋. 基于结构相似网页聚类的正文提取算法研究[J]. 计算机工程与应用, 2018, 54(11): 122-127.
[6]	徐绕山1，2，王爽2，3，孙正兴2. 视觉相似性计算的艺术图像自组织方法[J]. 计算机工程与应用, 2017, 53(18): 163-169.
[7]	王丽科，赵菊敏，李灯熬. 针对超市购物数据的深度分析算法[J]. 计算机工程与应用, 2017, 53(14): 18-23.
[8]	蔡蓉，钱东，王丹丹，朱平. 一种兼具生物和物理特征的E基因签名方法#br# ——以p53家族基因为例[J]. 计算机工程与应用, 2017, 53(13): 155-159.
[9]	恩德，张凤磊，张昭，忽胜强. 模糊熵在车载环境下语音端点检测中的应用[J]. 计算机工程与应用, 2016, 52(10): 147-150.
[10]	康茜1，李德玉1，2，王素格1，2，冀庆斌1. 传播过程中信号缺失的层次聚类社区发现算法[J]. 计算机工程与应用, 2015, 51(9): 201-206.
[11]	孙浩军，闪光辉，高玉龙，袁婷. 一种高维混合属性数据聚类算法[J]. 计算机工程与应用, 2015, 51(8): 128-133.
[12]	仰孝富，齐建东，吉鹏飞，朱文飞. 一种CF树结合KNN图划分的文本聚类算法[J]. 计算机工程与应用, 2015, 51(6): 114-119.
[13]	吴伟，李艳雄，王梓里，陈祝允. 基于语速差异的新闻发布会中首要说话人检测[J]. 计算机工程与应用, 2015, 51(4): 222-225.
[14]	董丽丽，董玮，张翔. 利用CUDA提高内存数据聚类效能的研究[J]. 计算机工程与应用, 2015, 51(22): 243-251.
[15]	张菲菲1，李宗海2，周晓辉1，李晓戈1,2. 基于层次聚类的跨文本中文人名消歧研究[J]. 计算机工程与应用, 2014, 50(6): 106-111.

说话人聚类的初始类生成方法

Generating initial clusters for speaker clustering

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics