[1] ANGUERA X, BOZONNET S, EVANS N, et al. Speaker diarization: a review of recent research[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2012, 20(2): 356-370.
[2] GISH H, SIU M H, ROHLICEK R. Segregation of speakers for speech recognition and speaker identification[C]//Proceedings of the 1991 International Conference on Acoustics, Speech, and Signal Processing, 1991: 873-876.
[3] SIU M H, YU G, GISH H. An unsupervised, sequential learning algorithm for the segmentation of speech waveforms with multiple speakers[C]//Proceedings of the 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing, 1992: 189-192.
[4] ROHLICEK J R, AYUSO D, BATES M, et al. Gisting conversational speech[C]//Proceedings of the 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing, 1992: 113-116.
[5] 凌锦雯,陆伟,刘青松,等. 利用EHMM和CLR的说话人分割聚类算法[J]. 小型微型计算机系统, 2012, 33(6): 1389-1392.
LING J W, LU W, LIU Q S, et al. Speaker diarization using EHMM and CLR[J]. Journal of Chinese Computer Systems, 2012, 33(6): 1389-1392.
[6] CHEN S, GOPALAKRISHNAN P. Speaker, environment and channel change detection and clustering via the Bayesian information criterion[C]//Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop, 1998, 8: 127-132.
[7] CHUNG J S, HUH J, NAGRANI A, et al. Spot the conversation: speaker diarisation in the wild[J]. arXiv:2007.01216, 2020.
[8] 徐志京, 张铁海. 加权全序列卷积神经网络方法的帕金森声纹识别研究[J]. 小型微型计算机系统, 2020, 41(12): 2683-2688.
XU Z J, ZHANG T H. Parkinson voiceprint recognition based on weighted deep full sequence convolutional neural network[J]. Journal of Chinese Computer Systems, 2020, 41(12): 2683-2688.
[9] LANDINI F, GLEMBEK O, MATěJKA P, et al. Analysis of the but diarization system for voxconverse challenge[C]//Proceedings of the 2021 IEEE International Conference on Acoustics, Speech and Signal Processing, 2021: 5819-5823.
[10] VARIANI E, LEI X, MCDERMOTT E, et al. Deep neural networks for small footprint text-dependent speaker verification[C]//Proceedings of the 2014 IEEE International Conference on Acoustics, Speech and Signal Processing, 2014: 4052-4056.
[11] HEIGOLD G, MORENO I, BENGIO S, et al. End-to-end text-dependent speaker verification[C]//Proceedings of the 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, 2016: 5115-5119.
[12] WANG Q, DOWNEY C, WAN L, et al. Speaker diarization with LSTM[C]//Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing, 2018: 5239-5243.
[13] LI Q, KREYSSIG F L, ZHANG C, et al. Discriminative neural clustering for speaker diarisation[C]//Proceedings of the 2021 IEEE Spoken Language Technology Workshop, 2021: 574-581.
[14] PARK T J, HAN K J, KUMAR M, et al. Auto-tuning spectral clustering for speaker diarization using normalized maximum eigengap[J]. IEEE Signal Processing Letters, 2019, 27: 381-385.
[15] KENNY P, REYNOLDS D, CASTALDO F. Diarization of telephone conversations using factor analysis[J]. IEEE Journal of Selected Topics in Signal Processing, 2010, 4(6): 1059-1070.
[16] LANDINI F, PROFANT J, DIEZ M, et al. Bayesian HMM clustering of x-vector sequences (VBx) in speaker diarization: theory, implementation and analysis on standard tasks[J]. Computer Speech & Language, 2022, 71: 101254.
[17] RAJ D, GARCIA-PERERA L P, HUANG Z, et al. DOVER-Lap: a method for combining overlap-aware diarization outputs[C]//Proceedings of the 2021 IEEE Spoken Language Technology Workshop, 2021: 881-888.
[18] RYANT N, SINGH P, KRISHNAMOHAN V, et al. The third DIHARD diarization challenge[J]. arXiv:2012.01477, 2020.
[19] CARLETTA J. Unleashing the killer corpus: experiences in creating the multi-everything AMI meeting corpus[J]. Language Resources and Evaluation, 2007, 41(2): 181-190.
[20] OTTERSON S, OSTENDORF M. Efficient use of overlap information in speaker diarization[C]//Proceedings of the 2007 IEEE Workshop on Automatic Speech Recognition & Understanding, 2007: 683-686.
[21] BREDIN H, YIN R, CORIA J M, et al. Pyannote.Audio: neural building blocks for speaker diarization[C]//Proceedings of the 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, 2020: 7124-7128.
[22] 龙华, 张林濮, 邵玉斌, 等. 说话人特征约束的多任务卷积网络语音增强[J]. 小型微型计算机系统, 2021, 42(10): 2178-2183.
LONG H, ZHANG L P, SHAO Y B, et al. Multi-task convolution network with speaker feature constraint for speech enhancement[J]. Journal of Chinese Computer Systems, 2021, 42(10): 2178-2183.
[23] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016: 770-778.
[24] NAGRANI A, CHUNG J S, XIE W, et al. VoxCeleb: large-scale speaker verification in the wild[J]. Computer Speech & Language, 2020, 60: 101027.
[25] SNYDER D, CHEN G, POVEY D. MUSAN: a music, speech, and noise corpus[J]. arXiv:1510.08484, 2015.
[26] KO T, PEDDINTI V, POVEY D, et al. A study on data augmentation of reverberant speech for robust speech recognition[C]//Proceedings of the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing, 2017: 5220-5224.
[27] WANG W, QIN X, CHENG M, et al. The DKU-SMIIP diarization system for the VoxCeleb speaker recognition challenge 2022[C]//Proceedings of the VoxSrc Workshop, 2022.
[28] CAI Q, HONG G, YE Z, et al. The Kriston AI system for the VoxCeleb speaker recognition challenge 2022[J]. arXiv:2209.11433, 2022.
[29] PARK D, YU Y, PARK K W, et al. GIST-AiTeR system for the diarization task of the 2022 VoxCeleb speaker recognition challenge[J]. arXiv:2209.10357, 2022.
[30] TEVISSEN Y, BOUDY J, PETITPONT F. The Newsbridge-Telecom SudParis VoxCeleb speaker recognition challenge 2022 system description[J]. arXiv:2301.07491, 2023.
[31] CHOI J H, JEOUNG Y R, KYUNG J, et al. HYU submission for the VoxCeleb speaker recognition challenge 2022[Z]. Hanyang University. Department of Electronic Engineering, 2022. |