基于强化学习的无人机系统模糊测试方法研究

doi:10.3778/j.issn.1002-8331.2401-0125

摘要/Abstract

摘要： 针对现有面向无人机的模糊测试方法生成测试用例针对性弱、变异过程存在随机性和盲目性的问题，提出一种基于强化学习的无人机系统模糊测试方法——RLPGFuzz，主要用于验证无人机是否遵守现有的安全策略。使用度量时态逻辑公式将无人机应当遵守的安全策略进行形式化描述，并确定输入空间。结合无人机的运动状态，将传统模糊测试过程建模为马尔可夫决策过程。在重要性加权强化学习算法的基础上，利用权重设计了一种新的概率采样方法，来提高样本数据的使用效率。策略网络根据奖励选择变异动作,引发无人机发生不符合安全策略的行为从而产生异常或漏洞。为了验证方法有效性，使用RLPGFuzz在仿真平台ArduPilot和PX4进行模糊测试。和当前主流方法相比，RLPGFuzz发现异常的时间缩短了20%，有效变异次数提升了30%，违反安全策略的命令序列缩短了10%。仿真结果表明，RLPGFuzz效率更高，发现漏洞类型更全面。

关键词: 模糊测试, 强化学习, 无人机系统, 安全策略

Abstract: To address the weaknesses in specificity of test case generation and the randomness in the mutation process of current fuzzy testing methods for UAVs, this study introduces RLPGFuzz, a reinforcement learning-based fuzzy testing approach for UAV systems. RLPGFuzz aims to assess UAV compliance with safety policies. It begins by formalizing these policies using metric temporal logic formulas and defining the input space. The fuzzy testing process is then modeled as a Markov decision process, incorporating UAV motion states. A novel probability sampling method, derived from an importance-weighted reinforcement learning algorithm, is employed to enhance sample data efficiency. The policy network selects mutation actions based on rewards, triggering UAV behaviors that breach safety policies, leading to anomalies or vulnerabilities. RLPGFuzz is tested on ArduPilot and PX4 simulation platforms. It demonstrates a 20% faster anomaly detection, 30% increase in effective mutations, and 10% reduction in safety policy-violating command sequences compared to mainstream methods, proving its higher efficiency and comprehensive vulnerability detection capability.

Key words: fuzzing, reinforcement learning, unmanned aerial vehicle, security strategy

于振华, 杨文建, 李西滕, 丛旭亚. 基于强化学习的无人机系统模糊测试方法研究[J]. 计算机工程与应用, 2024, 60(21): 89-98.

YU Zhenhua, YANG Wenjian, LI Xiteng, CONG Xuya. Fuzzing for Unmanned Aerial Vehicle System Based on Reinforcement Learning[J]. Computer Engineering and Applications, 2024, 60(21): 89-98.

参考文献

[1] 何道敬, 杜晓, 乔银荣, 等. 无人机信息安全研究综述[J]. 计算机学报, 2019, 42(5): 1076-1094.
HE D J, DU X, QIAO Y R, et al. A survey on cyber security of unmanned aerial vehicles[J]. Chinese Journal of Computers, 2019, 42(5): 1076-1094.
[2] KRATKY M, MINARIK V. The non-destructive methods of fight against UAVs[C]//Proceedingds of the 2017 International Conference on Military Technologies (ICMT), Brno, May 31- Jun 31, 2017. Piscataway, N J: IEEE, 2017: 690-694.
[3] MUJEEB S, CHOWDHARY S K, SRIVASTAVA A, et al. Unmanned aerial vehicle attack detection using snort[C]// International Conference on Innovation in Computer and Information Science, 2022: 18-24.
[4] XING R, SU Z, LUAN T H, et al. UAVs-aided delay-tolerant blockchain secure offline transactions in post-disaster vehicular networks[J]. IEEE Transactions on Vehicular Technology, 2022, 71(11): 12030-12043.
[5] SEDJELMACI H, SENOUCI S M. Cyber security methods for aerial vehicle networks: taxonomy, challenges and solution[J]. The Journal of Supercomputing, 2018, 74(10): 4928-4944.
[6] YANG H, ZHOU Q, YAO M, et al. Practical and compatible cryptographic solution to ADS-B security[J]. IEEE Internet of Things Journal, 2019, 6(2): 3322-3334.
[7] SCHILLER N, CHLOSTA M, SCHLOEGEL M, et al. Drone security and the mysterious case of DJI’s DroneID[C]//Proceedings of the Network and Distributed System Security Symposium (NDSS), San Diego, Mar 1-3, 2023. New York, N Y: ACM, 2023.
[8] 邹权臣, 张涛, 吴润浦, 等. 从自动化到智能化: 软件漏洞挖掘技术进展[J]. 清华大学学报 (自然科学版), 2018, 58(12): 1079-1094.
ZOU Q C, ZHANG T, WU R P, et al. From automation to intelligence: survey of research on vulnerability discovery techniques[J]. Journal of Tsinghua University (Science and Technology), 2018, 58(12): 1079-1094.
[9] ZHANG H, ZHANG Z, TANG W. Improve peach: making network protocol fuzz testing more precisely[J]. Applied Mechanics and Materials, 2014, 551: 642-647.
[10] DOMIN K, SYMEONIDIS I, MARIN E. Security analysis of the drone communication protocol: fuzzing the MAVLink protocol[C]//Proceedings of the 37th Symposium on Information Theory in the Benelux, 2016: 198-204.
[11] 叶向豪. 基于模糊测试的无人机软件系统漏洞挖掘研究[D]. 西安: 西安电子科技大学, 2019.
YE X H. Research on UAV system security vulnerability discovering based on fuzzing[D]. Xi’an: Xidian University, 2019.
[12] RUDO D, ZENG D K. Consumer UAV cybersecurity vulnerability assessment using fuzzing tests[J]. arXiv:2008. 03621, 2020.
[13] KIM T, KIM C H, RHEE J, et al. RVFuzzer: finding input validation bugs in robotic vehicles through control-guided testing[C]//Proceedings of the 28th USENIX Security Symposium, California, Aug 14-16, 2019. Piscataway, N J: IEEE, 2019: 425-442.
[14] CASALS S G, OWEZARSKI P, DESCARGUES G. Generic and autonomous system for airborne networks cyber-threat detection[C]//Proceedings of the 2013 IEEE AIAA 32nd Digital Avionics Systems Conference (DASC), New York, Oct 5-10, 2013. Piscataway, N J: IEEE, 2013.
[15] B?TTINGER K, GODEFROID P, SINGH R. Deep reinforcement fuzzing[C]//Proceedings of the 2018 IEEE Security and Privacy Workshops (SPW), California, May 24, 2018. Piscataway, N J: IEEE, 2018: 116-122.
[16] KOUB?A A, ALLOUCH A, ALAJLAN M, et al. Micro air vehicle link (mavlink) in a nutshell: a survey[J]. IEEE Access, 2019, 7: 87658-87680.
[17] KIM H, OZMEN M O, BIANCHI A, et al. PGFUZZ: policy-guided fuzzing for robotic vehicles[C]//Proceedings of the Network and Distributed System Security Symposium (NDSS), California, Feb 21-24, 2021. New York, N Y: ACM, 2021.
[18] ESPEHOLT L, SOYER H, MUNOS R, et al. Impala: scalable distributed deep-RL with importance weighted actor-learner architectures[C]//Proceedings of the International Conference on Machine Learning, Stockholm, Sweden, Jul 10-15, 2018: 1407-1416.
[19] LIN L J. Self-improving reactive agents based on reinforcement learning, planning and teaching[J]. Machine Learning, 1992, 8: 293-321.
[20] KIM S, KIM T. RoboFuzz: fuzzing robotic systems over robot operating system (ROS) for finding correctness bugs[C]//Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Singapore, Nov 14-18, 2022. New York, N Y: ACM, 2022: 447-458.
[21] CHOI H, KATE S, AAFER Y, et al. Cyber-physical inconsistency vulnerability identification for safety checks in robotic vehicles[C]//Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security, Nov 9-13, 2020: 263-278.
[22] HAN R, YANG C, MA S, et al. Control parameters considered harmful: detecting range specification bugs in drone configuration modules via learning-guided search[C]//Proceedings of the 44th International Conference on Software Engineering, Pennsylvania, May 21-29, 2022. New York, N Y: ACM, 2022: 462-473.
[23] GONG X, YU J, Lü S, et al. Actor-critic with familiarity-based trajectory experience replay[J]. Information Sciences, 2022, 582: 633-647.
[24] BANERJEE C, CHEN Z, NOMAN N. Improved soft actor-critic: mixing prioritized off-policy samples with on-policy experiences[J]. IEEE Transactions on Neural Networks and Learning Systems, 2024, 35(3): 3121-3129.