Generating initial clusters for speaker clustering

doi:10.3778/j.issn.1002-8331.1504-0255

Abstract

Abstract: During the procedure of state-of-art speaker clustering, the individual speech segment directly obtained from speaker segmentation is used as an initial cluster, which leads to huge amount of calculation. In this paper, an algorithm of generating initial clusters for speaker clustering is thus proposed in order to reduce calculation load. First, features are extracted from speech segments, and centroids of features are calculated. Then the initial clusters are generated by clustering these centroids using both hierarchical clustering algorithm and Bayesian information criterion under an easy stopping criterion. Experiments show that doing speaker clustering on the initial clusters generated by the proposed method is faster than doing speaker clustering on the speech segments directly obtained by speaker segmentation. The computational reduction is about 40.04% without losing the performance of speaker cluster, and the computational reduction is more than 60.03% with losing little performance of speaker cluster.

Key words: hierarchical clustering, Bayesian information criterion, speaker clustering, initial clusters, speech signal processing

摘要： 目前说话人聚类时将说话人分割后的语音段作为初始类，直接对这些数量庞大语音段进行聚类的计算量非常大。为了降低说话人聚类时的计算量，提出一种面向说话人聚类的初始类生成方法。提取说话人分割后语音段的特征参数及特征参数的质心，结合层次聚类法和贝叶斯信息准则，对语音段进行具有宽松停止准则的“预聚类”，生成初始类。与直接对说话人分割后的语音段进行聚类的方法相比，该方法能在保持原有聚类性能的情况下，减少40.04%的计算时间；在允许聚类性能略有下降的情形下，减少60.03%以上的计算时间。

关键词: 层次聚类, 贝叶斯信息准则, 说话人聚类, 初始类, 语音信号处理

LAI Songxuan, LI Yanxiong. Generating initial clusters for speaker clustering[J]. Computer Engineering and Applications, 2017, 53(3): 149-153.

赖松轩，李艳雄. 说话人聚类的初始类生成方法[J]. 计算机工程与应用, 2017, 53(3): 149-153.

[1]	WANG Junling, LU Xinming. Video Key Frame Extraction Algorithm Based on Semantic Correlation [J]. Computer Engineering and Applications, 2021, 57(4): 192-198.
[2]	HONG Zheng, GONG Qiyuan, FENG Wenbo, LI Yihao. Unknown Application Layer Protocol Recognition Based on Adaptive Clustering [J]. Computer Engineering and Applications, 2020, 56(5): 109-117.
[3]	WANG Xiyue1, HUANG Yipeng1, QIAN Jiahui1, HE Ling1, HUANG Hua1, YIN Heng2. Initial and final segmentation in cleft palate speech based on acoustic characteristics [J]. Computer Engineering and Applications, 2018, 54(8): 123-130.
[4]	SONG Dongyun, ZHENG Jin, ZHANG Zuping. Chinese short text similarity computation based on hybrid strategy [J]. Computer Engineering and Applications, 2018, 54(12): 116-120.
[5]	WANG Haiyong, FENG Zhaoxu, YANG Haibo, ZHANG Jindong. Research on text extraction algorithm based on structure similarity page clustering [J]. Computer Engineering and Applications, 2018, 54(11): 122-127.
[6]	XU Raoshan1，2, WANG Shuang2，3, SUN Zhengxing2. Self-organization method for artistic images based on visual similarity computation [J]. Computer Engineering and Applications, 2017, 53(18): 163-169.
[7]	CAI Rong, QIAN Dong, WANG Dandan, ZHU Ping. E-gene signature method with biological and physical characteristics—case in p53 gene family [J]. Computer Engineering and Applications, 2017, 53(13): 155-159.
[8]	EN De, ZHANG Fenglei, ZHANG Zhao, HU Shengqiang. Application of fuzzy entropy in speech endpoint detection in car environments [J]. Computer Engineering and Applications, 2016, 52(10): 147-150.
[9]	KANG Qian1, LI Deyu1，2, WANG Suge1，2, JI Qingbin1. Community detection algorithm based on hierarchical clustering under signal missing in propagating process [J]. Computer Engineering and Applications, 2015, 51(9): 201-206.
[10]	SUN Haojun, SHAN Guanghui, GAO Yulong, YUAN Ting. Algorithm for clustering of high-dimensional data mixed with numeric and categorical attributes [J]. Computer Engineering and Applications, 2015, 51(8): 128-133.
[11]	WU Wei, LI Yanxiong, WANG Zili, CHEN Zhuyun. Speaking rate differences based chief speakers detection in press conferences recordings [J]. Computer Engineering and Applications, 2015, 51(4): 222-225.
[12]	ZHANG Feifei1, LI Zonghai2, ZHOU Xiaohui1, LI Xiaoge1,2. Cross-document Chinese personal name entity disambiguation based on hierarchical clustering [J]. Computer Engineering and Applications, 2014, 50(6): 106-111.
[13]	SUN Haojun, SHAN Guanghui, GAO Yulong, YUAN Ting, WU Yunxia. Algorithm for high-dimensional categorical data weighted subspace clustering [J]. Computer Engineering and Applications, 2014, 50(23): 131-135.
[14]	TIAN Wanglan, LI Jiasheng. Improved use of deep belief networks for voice activity detection [J]. Computer Engineering and Applications, 2014, 50(20): 207-210.
[15]	WANG Min1, SUN Guang1, SHEN Lirong2, LIU Li1. Voice activity detection using logarithmic energy and cepstrum Distance [J]. Computer Engineering and Applications, 2014, 50(16): 198-201.

Generating initial clusters for speaker clustering

说话人聚类的初始类生成方法

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics