Computer Engineering and Applications ›› 2024, Vol. 60 ›› Issue (21): 89-98.DOI: 10.3778/j.issn.1002-8331.2401-0125

• Theory, Research and Development • Previous Articles     Next Articles

Fuzzing for Unmanned Aerial Vehicle System Based on Reinforcement Learning

YU Zhenhua, YANG Wenjian, LI Xiteng, CONG Xuya   

  1. College of Computer Science and Technology, Xi’an University of Science and Technology, Xi’an 710054, China
  • Online:2024-11-01 Published:2024-10-25

基于强化学习的无人机系统模糊测试方法研究

于振华,杨文建,李西滕,丛旭亚   

  1. 西安科技大学 计算机科学与技术学院,西安 710054

Abstract: To address the weaknesses in specificity of test case generation and the randomness in the mutation process of current fuzzy testing methods for UAVs, this study introduces RLPGFuzz, a reinforcement learning-based fuzzy testing approach for UAV systems. RLPGFuzz aims to assess UAV compliance with safety policies. It begins by formalizing these policies using metric temporal logic formulas and defining the input space. The fuzzy testing process is then modeled as a Markov decision process, incorporating UAV motion states. A novel probability sampling method, derived from an importance-weighted reinforcement learning algorithm, is employed to enhance sample data efficiency. The policy network selects mutation actions based on rewards, triggering UAV behaviors that breach safety policies, leading to anomalies or vulnerabilities. RLPGFuzz is tested on ArduPilot and PX4 simulation platforms. It demonstrates a 20% faster anomaly detection, 30% increase in effective mutations, and 10% reduction in safety policy-violating command sequences compared to mainstream methods, proving its higher efficiency and comprehensive vulnerability detection capability.

Key words: fuzzing, reinforcement learning, unmanned aerial vehicle, security strategy

摘要: 针对现有面向无人机的模糊测试方法生成测试用例针对性弱、变异过程存在随机性和盲目性的问题,提出一种基于强化学习的无人机系统模糊测试方法——RLPGFuzz,主要用于验证无人机是否遵守现有的安全策略。使用度量时态逻辑公式将无人机应当遵守的安全策略进行形式化描述,并确定输入空间。结合无人机的运动状态,将传统模糊测试过程建模为马尔可夫决策过程。在重要性加权强化学习算法的基础上,利用权重设计了一种新的概率采样方法,来提高样本数据的使用效率。策略网络根据奖励选择变异动作,引发无人机发生不符合安全策略的行为从而产生异常或漏洞。为了验证方法有效性,使用RLPGFuzz在仿真平台ArduPilot和PX4进行模糊测试。和当前主流方法相比,RLPGFuzz发现异常的时间缩短了20%,有效变异次数提升了30%,违反安全策略的命令序列缩短了10%。仿真结果表明,RLPGFuzz效率更高,发现漏洞类型更全面。

关键词: 模糊测试, 强化学习, 无人机系统, 安全策略