计算机工程与应用 ›› 2022, Vol. 58 ›› Issue (14): 1-15.DOI: 10.3778/j.issn.1002-8331.2202-0196
李克资,徐洋,张思聪,闫嘉乐
出版日期:
2022-07-15
发布日期:
2022-07-15
LI Kezi, XU Yang, ZHANG Sicong, YAN Jiale
Online:
2022-07-15
Published:
2022-07-15
摘要: 语音辨识技术是人机交互的重要方式。随着深度学习的不断发展,基于深度学习的自动语音辨识系统也取得了重要进展。然而,经过精心设计的音频对抗样本可以使得基于神经网络的自动语音辨识系统产生错误,给基于语音辨识系统的应用带来安全风险。为了提升基于神经网络的自动语音辨识系统的安全性,需要对音频对抗样本的攻击和防御进行研究。基于此,分析总结对抗样本生成和防御技术的研究现状,介绍自动语音辨识系统对抗样本攻击和防御技术面临的挑战和解决思路。
李克资, 徐洋, 张思聪, 闫嘉乐. 自动语音辨识对抗攻击和防御技术综述[J]. 计算机工程与应用, 2022, 58(14): 1-15.
LI Kezi, XU Yang, ZHANG Sicong, YAN Jiale. Survey on Adversarial Example Attack and Defense Technology for Automatic Speech Recognition[J]. Computer Engineering and Applications, 2022, 58(14): 1-15.
[1] VACHER M,SERIGNAT J F,CHAILLOL S.Sound classification in a smart room environment:an approach using GMM and HMM methods[C]//4th IEEE Conference on Speech Technology and Human-Computer Dialogue(SpeD 2007),2007:135-146. [2] BANSAL P,KANT A,KUMAR S,et al.Improved hybrid model of HMM/GMM for speech recognition[J].Technologies and Applications,2008:69. [3] ZOU Q,NI L,WANG Q,et al.Robust gait recognition by integrating inertial and RGBD sensors[J].IEEE Trans Cybern,2018,48(4):1136-1150. [4] SZEGEDY C,LIU W,JIA Y,et al.Going deeper with convolutions[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2015:1-9. [5] HE K,ZHANG X,REN S,et al.Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2016:770-778. [6] POVEY D,GHOSHAL A,BOULIANNE G,et al.The kaldi speech recognition toolkit[C]//IEEE 2011 Workshop on Automatic Speech Recognition and Understanding,2011. [7] HANNUN A,CASE C,CASPER J,et al.Deep speech:scaling up end-to-end speech recognition[J].arXiv:1412.5567,2014. [8] SU J,VARGAS D V,SAKURAI K.One pixel attack for fooling deep neural networks[J].IEEE Transactions on Evolutionary Computation,2019,23(5):828-841. [9] XIE C,WANG J,ZHANG Z,et al.Adversarial examples for semantic segmentation and object detection[C]//Proceedings of the IEEE International Conference on Computer Vision,2017:1369-1378. [10] SONG D,EYKHOLT K,EVTIMOV I,et al.Physical adversarial examples for object detectors[J].arXiv:1807.07769,2018. [11] REN S,DENG Y,HE K,et al.Generating natural language adversarial examples through probability weighted word saliency[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics,2019:1085-1097. [12] VAIDYA T,ZHANG Y,SHERR M,et al.Cocaine noodles:exploiting the gap between human and machine speech recognition[C]//9th USENIX Conference on Offensive Technologies,2015. [13] CARLINI N,MISHRA P,VAIDYA T,et al.Hidden voice commands[C]//Proceedings of the 25th USENIX Conference on Security Symposium(SEC’16),2016:513-530. [14] ZHANG G,YAN C,JI X,et al.Dolphinattack:inaudible voice commands[C]//Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security,2017:103-117. [15] SONG L,MITTAL P.Inaudible voice commands[C]//2017 ACM SIGSAC Conference on Computer and Communications Security,2017. [16] ROY N,HASSANIEH H,CHOUDHURY R R.Backdoor:making microphones hear inaudible sounds[C]//15th Annual International Conference,2017. [17] YUAN X J,CHEN Y X,ZHAO Y,et al.Commandersong:a systematic approach for practical adversarial voice recognition[J].arXiv:1801.08535,2018. [18] CARLINI N,WAGNER D.Audio adversarial examples:targeted attacks on speech-to-text[C]//2018 IEEE Security and Privacy Workshops(SPW),2018:1-7. [19] ALZANTOT M,BALAJI B,SRIVASTAVA M.Did you hear that? Adversarial examples against automatic speech recognition[J].arXiv:1801.00554,2018. [20] SAINATH T N,PARADA C.Convolutional neural networks for small-footprint keyword spotting[C]//Sixteenth Annual Conference of the International Speech Communication Association,2015. [21] TAORI R,KAMSETTY A,CHU B,et al.Psychoacoustic ples for black box audio systems[C]//2019 IEEE Security and Privacy Workshops(SPW),2019:15-20. [22] KHARE S,ARALIKATTE R,MANI S.Adversarial black-box attacks on automatic speech recognition systems using multi-objective evolutionary optimization[C]//Interspeech 2019,2019. [23] GUO C,RANA M,CISSE M,et al.Countering adversarial images using input transformations[C]//International Conference on Learning Representations,2018. [24] LIN J,GAN C,HAN S.Defensive quantization:when efficiency meets robustness[C]//International Conference on Learning Representations,2018. [25] LIANG B,LI H,SU M,et al.Detecting adversarial image examples in deep neural networks with adaptive noise reduction[J].IEEE Transactions on Dependable and Secure Computing,2021,18(1):72-85. [26] GOODFELLOW I J,SHLENS J,SZEGEDY C.Explaining and harnessing adversarial examples[J].arXiv:1412.6572,2014. [27] PAPERNOT N,MCDANIEL P,WU X,et al.Distillation as a defense to adversarial perturbations against deep neural networks[C]//2016 IEEE Symposium on Security and Privacy(SP),2016:582-597. [28] MOOSAVI-DEZFOOLI S M,FAWZI A,FAWZI O,et al.Universal adversarial perturbations[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2017:1765-1773. [29] MOOSAVI-DEZFOOLI S M,FAWZI A,FROSSARD P.Deepfool:a simple and accurate method to fool deep neural networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2016:2574-2582. [30] VADILLO J,SANTANA R.Universal adversarial examples in speech command classification[J].arXiv:1911.10182,2019. [31] ABDOLI S,HAFEMANN L G,RONY J,et al.Universal adversarial audio perturbations[J].arXiv:1908.03173,2019. [32] RONY J,HAFEMANN L G,OLIVEIRA L S,et al.Decoupling direction and norm for efficient gradient-based l2 adversarial attacks and defenses[J].IEEE/CVF Conference on Computer Vision & Pattern Recognition,2018. [33] NEEKHARA P,HUSSAIN S,PANDEY P,et al.Universal adversarial perturbations for speech recognition systems[J].arXiv:1905.03828,2019. [34] YU J L,BO L.A normalized levenshtein distance me- tric[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2007,29(6):1091-1095. [35] LU Z,HAN W,ZHANG Y,et al.Exploring targeted universal adversarial perturbations to end-to-end asr models[J].arXiv:2104.02757,2021. [36] CHAN W,JAITLY N,LE Q,et al.Listen,attend and spell:a neural network for large vocabulary conversational speech recognition[C]//2016 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP),2016:4960-4964. [37] GRAVES A,FERNáNDEZ S,GOMEZ F,et al.Connectionist temporal classification:labelling unsegmented sequence data with recurrent neural networks[C]//Proceedings of the 23rd International Conference on Machine Learning,2006:369-376. [38] GRAVES A.Sequence transduction with recurrent neural networks[J].arXiv:1211.3711,2012. [39] WANG D H,DONG L,WANG R,et al.Targeted speech adversarial example generation with generative adversarial network[J].IEEE Access,2020,8:124503-124513. [40] XIE Y,LI Z,SHI C,et al.Enabling fast and universal audio adversarial attack using generative model[C]//Proceedings of the AAAI Conference on Artificial Intelligence,2021:14129-14137. [41] WANG Y,YAO H,ZHAO S.Auto-encoder based dimensionality reduction[J].Neurocomputing,2016,184:232-242. [42] GOODFELLOW I,POUGET-ABADIE J,MIRZA M,et al.Generative adversarial nets[C]//Advances in Neural Information Processing Systems,2014. [43] YAKURA H,SAKUMA J.Robust audio adversarial example for a physical attack[J].arXiv:1810.11793,2018. [44] ATHALYE A,ENGSTROM L,ILYAS A,et al.Synthesizing robust adversarial examples[C]//International Conference on Machine Learning,2018:284-293. [45] QIN Y,CARLINI N,COTTRELL G,et al.Imperceptible,robust,and targeted adversarial examples for automatic speech recognition[C]//International Conference on Machine Learning,2019:5231-5240. [46] SCHEIBLER R,BEZZAM E,DOKMANIC I.Pyroomacoustics:a python package for audio room simulation and array processing algorithms[C]//2018 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP),2018:351-355. [47] SZURLEY J,KOLTER J Z.Perceptual based adversarial audio attacks[J].arXiv:1906.06355,2019. [48] SCH?NHERR L,EISENHOFER T,ZEILER S,et al.Imperio:robust over-the-air adversarial examples for automatic speech recognition systems[C]//Annual Computer Security Applications Conference,2020:843-855. [49] CHEN T,SHANGGUAN L,LI Z,et al.Metamorph:injecting inaudible commands into over-the-air voice controlled systems[C]//Proceedings of NDSS,2020. [50] LIU X,WAN K,DING Y,et al.Weighted-sampling audio adversarial example attack[J].Proceedings of the AAAI Conference on Artificial Intelligence,2020,34(4):4908-4915. [51] ESMAEILPOUR M,CARDINAL P,KOERICH A L.Towards robust speech-to-text adversarial attack[J].arXiv:2103. 08095,2021. [52] SHEN J,NGUYEN P,WU Y,et al.Lingvo:a modular and scalable framework for sequence-to-sequence modeling[J].arXiv:1902.08295,2019. [53] SCH?NHERR L,KOHLS K,ZEILER S,et al.Adversarial attacks against automatic speech recognition systems via psychoacoustic hiding[J].arXiv:1808.05665,2018. [54] RUDIN L I,OSHER S,FATEMI E.Nonlinear total variation based noise removal algorithms[J].Physica D:Nonlinear Phenomena,1992,60:259-268. [55] MURATA T,ISHIBUCHI H.Moga:multi-objective genetic algorithms[C]//IEEE International Conference on Evolutionary Computation,1995:289-294. [56] DEB K,PRATAP A,AGARWAL S,et al.A fast and elitist multiobjective genetic algorithm:NSGA-II[J].IEEE Transactions on Evolutionary Computation,2002,6(2):182-197. [57] ABDULLAH H,GARCIA W,PEETERS C,et al.Practical hidden voice attacks against speech and speaker recognition systems[J].arXiv:1904.05734,2019. [58] CHEN Y,YUAN X,ZHANG J,et al.Devil’s whisper:a general approach for physical adversarial attacks against commercial black-box speech recognition devices[C]//29th USENIX Conference on Security Symposium,2020:2667-2684. [59] ISHIDA S,ONO S.Adjust-free adversarial example generation in speech recognition using evolutionary multi-objective optimization under black-box condition[J].Artificial Life and Robotics,2021,26(2):243-249. [60] MADRY A,MAKELOV A,SCHMIDT L,et al.Towards deep learning models resistant to adversarial attacks[C]//International Conference on Learning Representations,2018. [61] SUN S,YEH C F,OSTENDORF M,et al.Training augmentation with adversarial examples for robust speech recognition[C]//Interspeech 2018,2018. [62] HINTON G,VINYALS O,DEAN J.Distilling the knowledge in a neural network[J].arXiv:1503.02531,2015. [63] DAS N,SHANBHOGUE M,CHEN S T,et al.Adagio:interactive experimentation with adversarial attack and defense for audio[C]//European Conference,ECML PKDD 2018,Dublin,Ireland,September 10-14,2018. [64] LATIF S,RANA R,QADIR J.Adversarial machine learning and speech emotion recognition:utilizing generative adversarial networks for robustness[J].arXiv:1811.11402,2018. [65] ESMAEILPOUR M,CARDINAL P,KOERICH A L.Class-conditional defense GAN against end-to-end speech attacks[C]//ICASSP 2021-2021 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP),2021:2565-2569. [66] ESMAEILPOUR M,CARDINAL P,KOERICH A L.A robust approach for securing audio classification against adversarial attacks[J].IEEE Transactions on Information Forensics and Security,2019,15:2147-2159. [67] TAMURA K,OMAGARI A,HASHIDA S.Novel defense method against audio adversarial example for speech-to-text transcription neural networks[C]//2019 IEEE 11th International Workshop on Computational Intelligence and Applications(IWCIA),2019:115-120. [68] YANG C H,QI J,CHEN P Y,et al.Characterizing speech adversarial examples using self-attention u-net enhancement[C]//ICASSP 2020-2020 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP),2020:3107-3111. [69] RAJARATNAM K,SHAH K,KALITA J.Isolated and ensemble audio preprocessing methods for detecting adversarial examples against automatic speech recognition[C]//Conference on Computational Linguistics and Speech Processing(ROCLING),2018. [70] SAMIZADE S,TAN Z H,SHEN C,et al.Adversarial example detection by classification for deep speech recognition[C]//ICASSP 2020-2020 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP),2020:3102-3106. [71] ZENG Q,SU J,FU C,et al.A multiversion programming inspired approach to detecting audio adversarial examples[C]//2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks(DSN),2019. [72] RAJARATNAM K,KALITA J.Noise flooding for detecting audio adversarial examples against automatic speech recognition[C]//2018 IEEE International Symposium on Signal Processing and Information Technology(ISSPIT),2018. [73] KWON H,YOON H,PARK K W.Poster:detecting audio adversarial example through audio modification[C]//Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security,2019:2521-2523. [74] YANG Z,CHEN P Y,LI B,et al.Characterizing audio adversarial examples using temporal dependency[C]//7th International Conference on Learning Representations,2019. [75] MA P,PETRIDIS S,PANTIC M.Detecting adversarial attacks on audio-visual speech recognition[J].arXiv:1912.08639,2019. [76] LIU Y P,CHEN X Y,LIU C,et al.Delving into transferable adversarial examples and black-box attacks[C]//International Conference on Learning Representations,2017. [77] CISSE M,ADI Y,NEVEROVA N,et al.Houdini:fooling deep structured prediction models[J].arXiv:1707.05373,2017. [78] AMODEI D,ANANTHANARAYANAN S,ANUBHAI R,et al.Deep speech 2:end-to-end speech recognition in English and Mandarin[C]//International Conference on Machine Learning,2016:173-182. [79] KREUK F,ADI Y,CISS′E M,et al.Fooling end-to-end speaker verification with adversarial examples[C]//IEEE International Conference on Acoustics,Speech and Signal Processing,2018:1962-1966. [80] 董胤蓬,苏航,朱军.面向对抗样本的深度神经网络可解释性分析[J].自动化学报,2022,48(1):75-86. DONG Y P,SU H,ZHU J.Interpretability analysis of deep neural networks with adversarial examples[J].Acta Automatica Sinica,2022,48(1):75-86. [81] HU S,SHANG X,QIN Z,et al.Adversarial examples for automatic speech recognition:attacks and countermeasures[J].IEEE Communications Magazine,2019,57(10):120-126. [82] ABDULLAH H,WARREN K,BINDSCHAEDLER V,et al.SoK:the faults in our ASRs:an overview of attacks against automatic speech recognition and speaker identification systems[C]//2021 IEEE Symposium on Security and Privacy(SP),2021:730-747. [83] 刘会,赵波,郭嘉宝,等.针对深度学习的对抗攻击综述[J].密码学报,2021,8(2):202-214. LIU H,ZHAO B,GUO J B,et al.Survey on adversarial attacks towards deep learning[J].Journal of Cryptologic Research,2021,8(2):202-214. [84] 潘文雯,王新宇,宋明黎,等.对抗样本生成技术综述[J].软件学报,2020,31(1):67-81. PAN W W,WANG X Y,SONG M L,et al.Survey on generating adversarial examples[J].Journal of Software,2020,31(1):67-81. [85] 张思思,左信,刘建伟.深度学习中的对抗样本问题[J].计算机学报,2019,42(8):1886-1904. ZHANG S S,ZUO X,LIU J W.The problem of the adversarial examples in deep learning[J].Chinese Journal of Computers,2019,42(8):1886-1904. [86] 张树栋,高海昌,曹曦文,等.针对ASR系统的快速有目标自适应对抗攻击[J].西安电子科技大学学报,2021,48(1):168-175. ZHANG S D,GAO H C,CAO X W,et al.Adaptive fast and targeted adversarial attack for speech recognition[J].Journal of Xidian Universarity,2021,48(1):1886-1904. [87] 王曙燕,金航,孙家泽.GAN图像对抗样本生成方法[J].计算机科学与探索,2021,15(4):702-711. WANG S Y,JIN H,SUN J Z.Method for image adversarial samples generating based on GAN[J].Journal of Frontiers of Computer Science and Technology,2021,15(4):702-711. [88] 陈晋音,叶林辉,郑海斌,等.面向语音识别系统的黑盒对抗攻击方法[J].小型微型计算机系统,2020,41(5):1019-1029. CHEN J Y,YE L H,ZHENG H B,et al.Black-box adversarial attack toward speech recognition system[J].Journal of Chinese Computer Systems,2020,41(5):1019-1029. |
[1] | 高广尚. 深度学习推荐模型中的注意力机制研究综述[J]. 计算机工程与应用, 2022, 58(9): 9-18. |
[2] | 吉梦, 何清龙. AdaSVRG:自适应学习率加速SVRG[J]. 计算机工程与应用, 2022, 58(9): 83-90. |
[3] | 徐尹翔, 陈祺东, 孙俊. 应用量子行为粒子群优化算法的文本对抗[J]. 计算机工程与应用, 2022, 58(9): 175-180. |
[4] | 罗向龙, 郭凰, 廖聪, 韩静, 王立新. 时空相关的短时交通流宽度学习预测模型[J]. 计算机工程与应用, 2022, 58(9): 181-186. |
[5] | 阿里木·赛买提, 斯拉吉艾合麦提·如则麦麦提, 麦合甫热提, 艾山·吾买尔, 吾守尔·斯拉木, 吐尔根·依不拉音. 神经机器翻译面对句长敏感问题的研究[J]. 计算机工程与应用, 2022, 58(9): 195-200. |
[6] | 陈一潇, 阿里甫·库尔班, 林文龙, 袁旭. 面向拥挤行人检测的CA-YOLOv5[J]. 计算机工程与应用, 2022, 58(9): 238-245. |
[7] | 方义秋, 卢壮, 葛君伟. 联合RMSE损失LSTM-CNN模型的股价预测[J]. 计算机工程与应用, 2022, 58(9): 294-302. |
[8] | 石颉, 袁晨翔, 丁飞, 孔维相. SAR图像建筑物目标检测研究综述[J]. 计算机工程与应用, 2022, 58(8): 58-66. |
[9] | 熊风光, 张鑫, 韩燮, 况立群, 刘欢乐, 贾炅昊. 改进的遥感图像语义分割研究[J]. 计算机工程与应用, 2022, 58(8): 185-190. |
[10] | 杨锦帆, 王晓强, 林浩, 李雷孝, 杨艳艳, 李科岑, 高静. 深度学习中的单阶段车辆检测算法综述[J]. 计算机工程与应用, 2022, 58(7): 55-67. |
[11] | 王志勇, 邢凯, 邓洪武, 李亚鸣, 胡璇. 基于小样本学习和因果干预的ResNeXt对抗攻击[J]. 计算机工程与应用, 2022, 58(7): 68-76. |
[12] | 王斌, 李昕. 融合动态残差的多源域自适应算法研究[J]. 计算机工程与应用, 2022, 58(7): 162-166. |
[13] | 谭暑秋, 汤国放, 涂媛雅, 张建勋, 葛盼杰. 教室监控下学生异常行为检测系统[J]. 计算机工程与应用, 2022, 58(7): 176-184. |
[14] | 张美玉, 刘跃辉, 侯向辉, 秦绪佳. 基于卷积网络的灰度图像自动上色方法[J]. 计算机工程与应用, 2022, 58(7): 229-236. |
[15] | 张壮壮, 屈立成, 李翔, 张明皓, 李昭璐. 基于时空卷积神经网络的数据缺失交通流预测[J]. 计算机工程与应用, 2022, 58(7): 259-265. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||