Computer Engineering and Applications ›› 2023, Vol. 59 ›› Issue (14): 232-241.DOI: 10.3778/j.issn.1002-8331.2203-0228

• Network, Communication and Security • Previous Articles     Next Articles

Research on Voiceprint Adversarial Detection of Improved Xception Network

LI Shuo, GU Yijun, TAN Hao, PENG Shufan   

  1. 1.College of Information and Cyber Security, People’s Public Security University of China, Beijing 100038, China
    2.Cyberspace Institute of Advanced Technology, Guangzhou University, Guangzhou 510006, China
  • Online:2023-07-15 Published:2023-07-15

改进Xception网络的声纹对抗检测研究

李烁,顾益军,谭昊,彭舒凡   

  1. 1.中国人民公安大学 信息网络安全学院,北京 100038
    2.广州大学 网络空间先进技术研究院,广州 510006

Abstract: Adversarial attacks against speaker recognition models have attracted widespread attention and posed a serious threat to speaker recognition systems’ security in recent years. A voiceprint adversarial sample detection model e_Xception is proposed to solve the problems of excessive parameter size and poor robustness of existing voiceprint adversarial sample detection methods. Xception is taken as the backbone network and embeds efficient channel attention(ECA) modules to fully extract speech features. A lightweight network model e_halfXception is designed to reduce parameters’ number while still maintaining high accuracy by reasonably reducing the width of the network model. Finally, a high-frequency masked speech data enhancement strategy HF-Mask is proposed to improve the model’s generalization. Experimental results demonstrate that high accuracy is achieved in the detection of six adversarial samples, FGSM, BIM, PGD, MI-FGSM, C&W and FAKEBOB, outperforming other detection methods, and the robustness of the model is investigated unknown attack algorithms, unknown target models, and unknown perturbation degrees, validating the model’s generalization.

Key words: speaker recognition, adversarial attack, adversarial detection, XceptionNet, data augmentation, robustness

摘要: 近年来,针对说话人识别模型的对抗攻击引起了广泛的关注,对说话人识别系统的安全构成了严重的威胁。为了解决现有的声纹对抗样本检测方法参数量过大、鲁棒性差的问题,提出一个声纹对抗样本检测模型e_Xception,该模型以Xception为主干网络,嵌入高效通道注意力(efficient channel attention,ECA)模块,充分提取语音特征。通过合理减少网络模型的宽度,设计了一个轻量级网络模型e_halfXception,减少参数量的同时,仍保持较高的精度。提出一种高频掩码的语音数据增强策略HF-Mask,提高模型的泛化性。实验结果证明,在对FGSM、BIM、PGD、MI-FGSM、C&W、FAKEBOB六种对抗样本的检测中,取得了较高的准确率,优于其他检测方法,并对模型开展了未知攻击算法、未知目标模型、未知扰动程度的鲁棒性研究,验证了模型的泛化能力。

关键词: 说话人识别, 对抗攻击, 对抗检测, Xception网络, 数据增强, 鲁棒性