深度神经网络的测试输入选择与度量标准研究综述

doi:10.3778/j.issn.1002-8331.2307-0382

摘要/Abstract

摘要： 随着深度神经网络在各个领域受到广泛应用，对其进行测试评估并确保其安全性显得尤为重要。测试输入选择方法可以帮助在测试数据集规模较大且标注成本较高的情况下，对测试样本进行选择与排序，以提高测试效率和测试覆盖率。为了深入了解深度神经网络测试输入选择领域的研究进展，对近5年来91篇相关领域的学术论文进行了系统梳理。介绍了深度神经网络测试的基本概念和流程，包括深度学习系统的构建、测试输入的选择和测试结果的评估；概括分析了各种度量标准和测试输入选择方法的适用场景与不足之处，以及彼此之间的相互联系。最后，指出了当前深度神经网络测试输入选择与评估工作面临的挑战和机遇。

关键词: 深度神经网络测试, 测试输入度量标准, 测试输入选择, 测试输入优先级

Abstract: As deep neural networks are widely used in various fields, it is particularly important to test and evaluate them and ensure their safety. When the test dataset is large and the labelled cost is expensive, the test input selection method can select and sort the test samples to improve the test efficiency and test coverage. In order to further understand the research progress in the field of test input selection for deep neural networks, 91 academic papers in related fields over the past five years are systematically sorted out. Firstly, the basic concepts and processes of deep neural networks testing are introduced, including the construction of deep learning systems, test input selecting and the test metrics. Secondly, the paper outlines and analyzes the applicable scenarios and shortcomings of various metrics and test input selection methods, as well as the interconnections among them. Finally, current challenges and opportunities for deep neural networks test input selection and metrics are pointed out.

Key words: deep neural networks testing, test input metrics, test input selection, test input prioritization

严荭, 杨丰玉, 钟依慧, 熊宇, 陈雨安. 深度神经网络的测试输入选择与度量标准研究综述[J]. 计算机工程与应用, 2024, 60(6): 27-42.

YAN Hong, YANG Fengyu, ZHONG Yihui, XIONG Yu, CHEN Yu’an. Survey on Test Input Selection and Metrics for Deep Neural Networks[J]. Computer Engineering and Applications, 2024, 60(6): 27-42.

参考文献

[1] LU H, WANG L, YE M, et al. DNN-based image classification for software gui testing[C]//IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computing, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation, 2018: 1818-1823.
[2] FENG J. Technical change and development trend of automatic driving[C]//Proceedings of the 2nd International Conference on Computing and Data Science (CDS), 2021: 319-324.
[3] DENG L, HINTON G, KINGSBURY B. New types of deep neural network learning for speech recognition and related applications: an overview[C]//Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, 2013: 8599-8603.
[4] HASSAN M D, NASRET A N, BAKER M R, et al. Enhancement automatic speech recognition by deep neural networks[J]. Periodicals of Engineering and Natural Sciences, 2021, 9(4): 921-927.
[5] HIRANO H, MINAGI A, TAKEMOTO K. Universal adversarial attacks on deep neural networks for medical image classification[J]. BMC Medical Imaging, 2021, 21(1): 1-13.
[6] EYKHOLT K, EVTIMOV I, FERNANDES E, et al. Robust physical-world attacks on deep learning visual classification[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 1625-1634.
[7] MASUDA S, ONO K, YASUE T, et al. A survey of software quality for machine learning applications[C]//Proceedings of the IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW), 2018: 279-284.
[8] RICCIO V, JAHANGIROVA G, STOCCO A, et al. Testing machine learning based systems: a systematic mapping[J]. Empirical Software Engineering, 2020, 25: 5193-5254.
[9] 李舵, 董超群, 司品超, 等.神经网络验证和测试技术研究综述[J].计算机工程与应用, 2021, 57(22): 53-67.
LI D, DONG C Q, SI P C, et al. Survey of research on neural network verification and testing technology[J]. Computer Engineering and Applications, 2021, 57(22): 53-67.
[10] 王赞, 闫明, 刘爽, 等. 深度神经网络测试研究综述[J]. 软件学报, 2020, 31(5): 1255-1275.
WANG Z, YAN M, LIU S, et al.Survey on testing of deep neural networks[J]. Journal of Software, 2020, 31(5): 1255-1275.
[11] HUANG X, KROENING D, RUAN W, et al. A survey of safety and trustworthiness of deep neural networks: verification, testing, adversarial attack and defence, and interpretability[J]. Computer Science Review, 2020, 37: 100270.
[12] ZHANG J M, HARMAN M, MA L, et al. Machine learning testing: survey, landscapes and horizons[J]. IEEE Transactions on Software Engineering, 2020, 48(1): 1-36.
[13] WU T, DONG Y, DONG Z, et al. Testing artificial intelligence system towards safety and robustness: state of the art[J]. IAENG International Journal of Computer Science, 2020, 47(3): 449-462.
[14] BERTOLINO A. Software testing research: achievements, challenges, dreams[C]//Proceedings of the Future of Software Engineering (FOSE’07), 2007: 85-103.
[15] PEI K, CAO Y, YANG J, et al. Deepxplore: automated whitebox testing of deep learning systems[C]//Proceedings of the 26th Symposium on Operating Systems Principles, 2017: 1-18.
[16] MA L, JUEFEI-XU F, ZHANG F, et al. Deepgauge: multi-granularity testing criteria for deep learning systems[C]//Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, 2018: 120-131.
[17] SUN Y, HUANG X, KROENING D, et al. Testing deep neural networks[J]. arXiv:1803.04792, 2018.
[18] LING X, JI S, ZOU J, et al. Deepsec: a uniform platform for security analysis of deep learning model[C]//Proceedings of the IEEE Symposium on Security and Privacy, 2019: 673-690.
[19] GUO C, PLEISS G, SUN Y, et al. On calibration of modern neural networks[C]//Proceedings of the International Conference on Machine Learning, 2017: 1321-1330.
[20] ZHOU M, PATEL V M. Enhancing adversarial robustness for deep metric learning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 15325-15334.
[21] CARLINI N, ATHALYE A, PAPERNOT N, et al. On evaluating adversarial robustness[J]. arXiv:1902.06705, 2019.
[22] TANAY T, GRIFFIN L. A boundary tilting persepective on the phenomenon of adversarial examples[J]. arXiv:1608. 07690, 2016.
[23] WANG Z, BOVIK A C, SHEIKH H R, et al. Image quality assessment: from error visibility to structural similarity[J]. IEEE Transactions on Image Processing, 2004, 13(4): 600-612.
[24] WANG Z, SIMONCELLI E P, BOVIK A C. Multiscale structural similarity for image quality assessment[C]//Proceedings of the Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003: 1398-1402.
[25] ZHANG R, ISOLA P, EFROS A A, et al. The unreason-able effectiveness of deep features as a perceptual metric[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 586-595.
[26] GOWAL S, QIN C, UESATO J, et al. Uncovering the limits of adversarial training against norm-bounded adversarial examples[J]. arXiv:2010.03593, 2020.
[27] LUO B, LIU Y, WEI L, et al. Towards imperceptible and robust adversarial example attacks against neural networks[J]. arXiv:1801.04693, 2018.
[28] MITTAL A, SOUNDARARAJAN R, BOVIK A C. Making a “completely blind” image quality analyzer[J]. IEEE Signal Processing Letters, 2012, 20(3): 209-212.
[29] AGHABABAEYAN Z, ABDELLATIF M, BRIAND L, et al. Black-box testing of deep neural networks through test case diversity[J]. arXiv:2112.12591, 2021.
[30] KOLMOGOROV A N. Three approaches to the quantitative definition of information[J]. International Journal of Computer Mathematics, 1968, 2(1/4): 157-168.
[31] BENNETT C H, GáCS P, LI M, et al. Information distance[J]. IEEE Transactions on Information Theory, 1998, 44(4): 1407-1423.
[32] MANI S, SANKARAN A, TAMILSELVAM S, et al. Coverage testing of deep learning models using dataset characterization[J]. arXiv:1911.07309, 2019.
[33] SHI Y, YIN B, ZHENG Z, et al. An empirical study on test case prioritization metrics for deep neural networks[C]//Proceedings of the 2021 IEEE 21st International Conference on Software Quality, Reliability and Security (QRS), 2021: 157-166.
[34] GRENDáR JR M, GRENDáR M. Maximum probability and maximum entropy methods: Bayesian interpretation[C]//Proceedings of the American Institute of Physics Conference, 2004: 490-494.
[35] SHANNON C E. A mathematical theory of communication[J]. ACM Sigmobile Mobile Computing and Communications Review, 2001, 5(1): 3-55.
[36] NEAL R M. Bayesian learning for neural networks[J]. IEEE Transactions on Neural Networks, 1997, 8(2): 456.
[37] GAL Y, GHAHRAMANI Z. Dropout as a Bayesian approximation: representing model uncertainty in deep learning[C]//Proceedings of the International Conference on Machine Learning, 2016: 1050-1059.
[38] VAN AMERSFOORT J, SMITH L, TEH Y W, et al. Uncertainty estimation using a single deep deterministic neural network[C]//Proceedings of the International Conference on Machine Learning, 2020: 9690-9700.
[39] FENG Y, SHI Q, GAO X, et al. DeepGini: prioritizing massive tests to enhance the robustness of deep neural networks[C]//Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA), 2020: 177-188.
[40] XU Y, ZHANG Z, ZHOU Y, et al. DeepMnist: a method of white box testing based on hierarchy[C]//Proceedings of the IEEE 21st International Conference on Software Quality, Reliability and Security Companion (QRS-C), 2021: 712-717.
[41] ZHOU Z, DOU W, LIU J, et al. Deepcon: contribution coverage testing for deep learning systems[C]//Proceedings of the IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), 2021: 189-200.
[42] SHEN W, WAN J, CHEN Z. MuNN: mutation analysis of neural networks[C]//Proceedings of the IEEE International Conference on Software Quality, Reliability and Security Companion (QRS-C), 2018: 108-115.
[43] MA L, ZHANG F, SUN J, et al. Deepmutation: mutation testing of deep learning systems[C]//Proceedings of the IEEE 29th International Symposium on Software Reliability Engineering (ISSRE), 2018.
[44] KLAMPFL L, CHETOUANE N, WOTAWA F. Mutation testing for artificial neural networks: an empirical evaluation[C]//Proceedings of the IEEE 20th International Conference on Software Quality, Reliability and Security (QRS), 2020.
[45] HUMBATOVA N, JAHANGIROVA G, TONELLA P. Deepcrime: mutation testing of deep learning systems based on real faults[C]//Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA), 2021.
[46] RICCIO V, HUMBATOVA N, JAHANGIROVA G, et al. Deepmetis: augmenting a deep learning test set to increase its mutation score[C]//Proceedings of the 36th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2021.
[47] 刘佳洛, 姚奕, 黄松, 等.机器学习图像分类程序的蜕变测试框架[J].计算机工程与应用, 2020, 56(17): 69-77.
LIU J L, YAO Y, HUANG S, et al. Metamorphic testing framework for machine learning image classification program[J]. Computer Engineering and Applications, 2020, 56(17): 69-77.
[48] LI Z, PAN M, ZHANG T, et al. Testing dnn-based autonomous driving systems under critical environmental conditions[C]//Proceedings of the International Conference on Machine Learning, 2021: 6471-6482.
[49] TIAN Y, PEI K, JANA S, et al. Deeptest: automated testing of deep-neural-network-driven autonomous cars[C]//Proceedings of the 40th International Conference on Software Engineering (ICSE), 2018: 303-314.
[50] ZHANG M, ZHANG Y, ZHANG L, et al. Deeproad: GAN-based metamorphic testing and input validation framework for autonomous driving systems[C]//Proceedings of the 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE), 2018: 132-142.
[51] YUAN Y, PANG Q, WANG S. Unveiling hidden DNN defects with decision-based metamorphic testing[C]//Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering, 2022: 1-13.
[52] 代贺鹏, 孙昌爱, 金慧, 等.面向深度学习系统的模糊测试技术研究进展[J].软件学报, 2023(11): 5008-5028.
DAI H P, SUN C A, JIN H, et al. State-of-the-art survey on fuzz testing for deep learning system[J]. Journal of Software, 2023(11): 5008-5028.
[53] SMYS S, CHEN J I Z, SHAKYA S. Survey on neural network architectures with deep learning[J]. Journal of Soft Computing Paradigm (JSCP), 2020, 2(3): 186-194.
[54] HUANG X, KWIATKOWSKA M, WANG S, et al. Safety verification of deep neural networks[C]//Proceedings of the 29th International Conference on Computer Aided Verification, 2017: 3-29.
[55] HAINS G, JAKOBSSON A, KHMELEVSKY Y. Towards formal methods and software engineering for deep learning: security, safety and productivity for dl systems development[C]//Proceedings of the Annual IEEE International Systems Conference (Syscon), 2018: 1-5.
[56] WICKER M, HUANG X, KWIATKOWSKA M. Feature-guided black-box safety testing of deep neural networks[C]//Proceedings of the International Conference on Tools and Algorithms for the Construction and Analysis of Systems. Cham: Springer, 2018: 408-426.
[57] YANG Z, SHI J, ASYROFI M H, et al. Revisiting neuron coverage metrics and quality of deep neural networks[J]. arXiv:2201.00191, 2022.
[58] LIU Z, FENG Y, YIN Y, et al. Deepstate: selecting test suites to enhance the robustness of recurrent neural networks[C]//Proceedings of the 44th International Conference on Software Engineering (ICSE), 2022: 598-609.
[59] GUO J, JIANG Y, ZHAO Y, et al. DLFuzz: differential fuzzing testing of deep learning systems[C]//Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC-FSE), 2018: 739-743.
[60] XIE X, MA L, JUEFEI-XU F, et al. Deephunter: a coverage-guided fuzz testing framework for deep neural networks[C]//Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA), 2019: 146-157.
[61] DEMIR S, ENISER H F, SEN A. DeepSmartFuzzer: reward guided test generation for deep learning[J]. arXiv:1911. 10621, 2019.
[62] ODENA A, OLSSON C, ANDERSEN D, et al. Tensorfuzz: debugging neural networks with coverage-guided fuzzing[C]//Proceedings of the International Conference on Machine Learning, 2019: 4901-4911.
[63] CHASLOT G M J B, WINANDS M H M, HERIK H J, et al. Progressive strategies for monte-carlo tree search[J]. New Mathematics and Natural Computation, 2008, 4(3): 343-357.
[64] ZHANG P, REN B, DONG H, et al. CAGFuzz: coverage-guided adversarial generative fuzzing testing for image-based deep learning systems[J]. IEEE Transactions on Software Engineering, 2021, 48(11): 4630-4646.
[65] TAO C, TAO Y, GUO H, et al. DLRegion: coverage-guided fuzz testing of deep neural networks with region-based neuron selection strategies[J]. Information and Software Technology, 2023, 162: 1-13.
[66] LI Z, MA X, XU C, et al. Boosting operational DNN testing efficiency through conditioning[C]//Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC-FSE), 2019: 499-509.
[67] HU Q, GUO Y, CORDY M, et al. An empirical study on data distribution-aware test selection for deep learning enhancement[J]. ACM Transactions on Software Engineering and Methodology (TOSEM), 2022, 31(4): 1-30.
[68] HU Q, GUO Y, XIE X, et al. Aries: efficient testing of deep neural networks via labeling-free accuracy estimation[C]//Proceedings of the 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE), 2023: 1776-1787.
[69] LI Y, PEI H, HUANG L, et al. A distance-based dynamic random testing strategy for natural language processing dnn models[C]//Proceedings of the 2022 IEEE 22nd International Conference on Software Quality, Reliability and Security (QRS), 2022: 842-853.
[70] 张娜, 徐海霞, 包晓安, 等.一种动态约简的多目标测试用例优先级排序方法[J].计算机科学, 2019, 46(12): 208-212.
ZHANG N, XU H X, BAO X A, et al. Multi-objective test case prioritization method combined with dynamic reduction[J]. Computer Science, 2019, 46(12): 208-212.
[71] AGHABABAEYAN Z, ABDELLATIF M, DADKHAH M, et al. DeepGD: a multi-objective black-box test selection approach for deep neural networks[J]. arXiv:2303.04878, 2023.
[72] HAO Y, HUANG Z, GUO H, et al. Test input selection for deep neural network enhancement based on multiple-objective optimization[C]//Proceedings of the 2023 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), 2023: 534-545.
[73] DEB K, PRATAP A, AGARWAL S, et al. A fast and elitist multiobjective genetic algorithm: NSGA-II [J]. IEEE Transactions on Evolutionary Computation, 2002, 6(2): 182-197.
[74] GUO Y, HU Q, CORDY M, et al. DRE: density-based data selection with entropy for adversarial-robust deep learning models[J]. Neural Computing and Applications, 2023, 35(5): 4009-4026.
[75] BAO S, SHA C, CHEN B, et al. In defense of simple techniques for neural network test case selection[C]//Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis, 2023: 501-513.
[76] KIM J, FELDT R, YOO S. Guiding deep learning system testing using surprise adequacy[C]//Proceedings of the IEEE/ACM 41st International Conference on Software Engineering (ICSE), 2019: 1039-1049.
[77] WEISS M, CHAKRABORTY R, TONELLA P. A review and refinement of surprise adequacy[C]//Proceedings of the IEEE/ACM Third International Workshop on Deep Learning for Testing and Testing for Deep Learning (DeepTest), 2021: 17-24.
[78] ZHOU J, LI F, DONG J, et al. Cost-effective testing of a deep learning model through input reduction[C]//Proceedings of the IEEE 31st International Symposium on Software Reliability Engineering (ISSRE), 2020: 289-300.
[79] CHEN J, WU Z, WANG Z, et al. Practical accuracy estimation for efficient deep neural network testing[J]. ACM Transactions on Software Engineering and Methodology (TOSEM), 2020, 29(4): 1-35.
[80] CHEN Y, WANG Z, WANG D, et al. Behavior pattern-driven test case selection for deep neural networks[C]//Proceedings of the 2019 IEEE International Conference on Artificial Intelligence Testing (AITest), 2019: 89-90.
[81] LI Z, ZHANG L, YAN J, et al. Peacepact: prioritizing examples to accelerate perturbation-based adversary generation for DNN classification testing[C]//Proceedings of the 2020 IEEE 20th International Conference on Software Quality, Reliability and Security (QRS), 2020: 406-413.
[82] ZHANG K, ZHANG Y, ZHANG L, et al. Neuron activation frequency based test case prioritization[C]//Proceedings of the 2020 International Symposium on Theoretical Aspects of Software Engineering (TASE), 2020: 81-88.
[83] ZHAO C, MU Y, CHEN X, et al. Can test input selection methods for deep neural network guarantee test diversity? a large-scale empirical study[J]. Information and Software Technology, 2022, 150: 106982.
[84] BYUN T, SHARMA V, VIJAYAKUMAR A, et al. Input prioritization for testing neural networks[C]//Proceedings of the 2019 IEEE International Conference on Artificial Intelligence Testing (AITest), 2019: 63-70.
[85] WEISS M, TONELLA P. Simple techniques work surprisingly well for neural network test prioritization and active learning (replicability study)[J]. arXiv:2205.00664, 2022.
[86] WANG H, XU J, XU C, et al. Dissector: input validation for deep learning applications by crossing-layer dissection[C]//Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering (ICSE), 2020: 727-738.
[87] PAN Z, ZHOU S, WANG J, et al. Test case prioritization for deep neural networks[C]//Proceedings of the 2022 9th International Conference on Dependable Systems and Their Applications (DSA), 2022: 624-628.
[88] ZHANG L, SUN X, LI Y, et al. A noise sensitivity analysis based test prioritization technique for deep neural networks[J]. arXiv:1901.00054, 2019.
[89] GAO X, FENG Y, YIN Y, et al. Adaptive test selection for deep neural networks[C]//Proceedings of the IEEE/ACM 44th International Conference on Software Engineering (ICSE), 2022: 73-85.
[90] SHEN W, LI Y, CHEN L, et al. Multiple-boundary clustering and prioritization to promote neural network retraining[C]//Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2020: 410-422.
[91] AL-QADASI H, WU C, FALCONE Y, et al. Deepabstraction: 2-level prioritization for unlabeled test inputs in deep neural networks[C]//Proceedings of the 2022 IEEE International Conference on Artificial Intelligence Testing (AITest), 2022: 64-71.
[92] YAN R, CHEN Y, GAO H, et al. Test case prioritization with neuron valuation based pattern[J]. Science of Computer Programming, 2022, 215: 102761.
[93] TAO Y, TAO C, GUO H, et al. TPFL: test input prioritization for deep neural networks based on fault localization[C]//Proceedings of the International Conference on Advanced Data Mining and Applications. Cham: Springer Nature, 2022: 368-383.
[94] CHEN J, GE J, ZHENG H. ActGraph: prioritization of test cases based on deep neural network activation graph[J]. arXiv:2211.00273, 2022.
[95] LI Y, LI M, LAI Q, et al. TestRank: bringing order into unlabeled test instances for deep learning tasks[C]//Advances in Neural Information Processing Systems, 2021: 20874-20886.
[96] ZHENG H, CHEN J, JIN H. Certpri: certifiable prioritization for deep neural networks via movement cost in feature space[J]. arXiv:2307.09375, 2023.
[97] WANG Z, YOU H, CHEN J, et al. Prioritizing test inputs for deep neural networks via mutation analysis[C]//Proceedings of the 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE), 2021: 397-409.
[98] DANG X, LI Y, PAPADAKIS M, et al. GraphPrior: mutation-based test input prioritization for graph neural networks[J]. ACM Transactions on Software Engineering and Methodology, 2023, 33(1): 1-40.
[99] WEI Z, WANG H, ASHRAF I, et al. Predictive mutation analysis of test case prioritization for deep neural networks[C]//Proceedings of the 2022 IEEE 22nd International Conference on Software Quality, Reliability and Security (QRS), 2022: 682-693.
[100] ZOHDINASAB T, RICCIO V, GAMBI A, et al. Efficient and effective feature space exploration for testing deep learning systems[J]. ACM Transactions on Software Engineering and Methodology, 2023, 32(2): 1-38.
[101] HAQ F U, SHIN D, BRIAND L. Efficient online testing for dnn-enabled systems using surrogate-assisted and many-objective optimization[C]//Proceedings of the 44th International Conference on Software Engineering (ICSE), 2022: 811-822.
[102] KIM J, KWON M, YOO S. Generating test input with deep reinforcement learning[C]//Proceedings of the 11th International Workshop on Search-Based Software Testing, 2018: 51-58.

编辑推荐 0

Metrics

阅读次数

全文

HTML			PDF

最新录用	在线预览	正式出版	最新录用	在线预览	正式出版
0	0	0	0	0	87

	来源	本网站

	次数	87
	比例	100%

摘要

最新录用	在线预览	正式出版

0	0	95

	来源	本网站

	次数	95
	比例	100%