[1] HANSEN J H L, HASAN T. Speaker recognition by machines and humans: a tutorial review[J]. IEEE Signal Processing Magazine, 2015, 32(6): 74-99.
[2] FARRúS M. Voice disguise in automatic speaker recognition[J]. ACM Computing Surveys, 2018, 51(4): 1-22.
[3] TAN X, QIN T, SOONG F, et al. A survey on neural speech synthesis[J]. arXiv:2106.15561, 2021.
[4] WANG C Y, CHEN S Y, WU Y, et al. Neural codec language models are zero-shot text to speech synthesizers[J]. arXiv:2301.02111, 2023.
[5] LIU X C, WANG X, SAHIDULLAH M, et al. ASVspoof 2021: towards spoofed and deepfake speech detection in the wild[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2023, 31: 2507-2522.
[6] YI J Y, TAO J H, FU R B, et al. ADD 2023: the second audio deepfake detection challenge[J]. arXiv:2305.13774, 2023.
[7] KüNZEL H J. Effects of voice disguise on speaking fundamental frequency[J]. The International Journal of Speech, Language and the Law, 2000, 7(2): 149-179.
[8] KüNZEL H J, GONZALEZ-RODRIGUEZ J, ORTEGA-GARCíA J. Effect of voice disguise on the performance of a forensic automatic speaker recognition system[C]//Proceedings of the ODYSSEY04-The Speaker and Language Recognition Workshop, 2004.
[9] KAJAREKAR S S, BRATT H, SHRIBERG E, et al. A study of intentional voice modifications for evading automatic speaker recognition[C]//Proceedings of the 2006 IEEE Odyssey-The Speaker and Language Recognition Workshop. Piscataway: IEEE, 2006: 1-6.
[10] ZHANG C L, TAN T J. Voice disguise and automatic speaker recognition[J]. Forensic Science International, 2008, 175(2/3): 118-122.
[11] TAN T J. The effect of voice disguise on Automatic Speaker Recognition[C]//Proceedings of the 2010 3rd International Congress on Image and Signal Processing. Piscataway: IEEE, 2010: 3538-3541.
[12] GONZáLEZ HAUTAM?KI R, SAHIDULLAH M, HAUTAM?KI V, et al. Acoustical and perceptual study of voice disguise by age modification in speaker verification[J]. Speech Communication, 2017, 95: 1-15.
[13] ZHENG L L, LI J K, SUN M, et al. When automatic voice disguise meets automatic speaker verification[J]. IEEE Transactions on Information Forensics and Security, 2021, 16: 824-837.
[14] BROWN A, HUH J, NAGRANI A, et al. Playing a part: speaker verification at the movies[C]//Proceedings of the 2021 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE, 2021: 6174-6178.
[15] NAGRANI A, CHUNG J S, ZISSERMAN A. VoxCeleb: a large-scale speaker identification dataset[J]. arXiv:1706. 08612, 2017.
[16] CHUNG J S, NAGRANI A, ZISSERMAN A. VoxCeleb2: deep speaker recognition[J]. arXiv:1806.05622, 2018.
[17] FAN Y, KANG J W, LI L T, et al. CN-celeb: a challenging Chinese speaker recognition dataset[C]//Proceedings of the 2020 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE, 2020: 7604-7608.
[18] SERENGIL S I, OZPINAR A. HyperExtended LightFace: a facial attribute analysis framework[C]//Proceedings of the 2021 International Conference on Engineering and Emerging Technologies. Piscataway: IEEE, 2022: 1-4.
[19] DENG J K, GUO J, VERVERAS E, et al. RetinaFace: single-shot multi-level face localisation in the wild[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 5202-5211.
[20] TAO R J, PAN Z X, DAS R K, et al. Is someone speaking? : exploring long-term temporal features for audio-visual active speaker detection[C]//Proceedings of the 29th ACM International Conference on Multimedia. New York: ACM, 2021: 3927-3935.
[21] BROMLEY J, BENTZ J W, BOTTOU L, et al. Signature verification using a “Siamese” time delay neural network[C]//Advances in Pattern Recognition Systems Using Neural Network Technologies, 1994: 25-44.
[22] LIN Y K, QIN X Y, CUI H H, et al. Laugh betrays you? learning robust speaker representation from speech containing non-verbal fragments[J]. arXiv:2210.16028, 2022.
[23] DENG J K, GUO J, XUE N N, et al. ArcFace: additive angular margin loss for deep face recognition[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 4685-4694.
[24] HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 770-778.
[25] SNYDER D, CHEN G G, POVEY D. MUSAN: a music, speech, and noise corpus[J]. arXiv:1510.08484, 2015.
[26] KO T, PEDDINTI V, POVEY D, et al. A study on data augmentation of reverberant speech for robust speech recognition[C]//Proceedings of the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE, 2017: 5220-5224. |