计算机工程与应用 ›› 2024, Vol. 60 ›› Issue (20): 284-292.DOI: 10.3778/j.issn.1002-8331.2307-0199

• 网络、通信与安全 • 上一篇    下一篇

基于强化学习多算法组合模型的智能化模糊测试技术

许爱东,徐培明,尚进,孙钦东   

  1. 1.南方电网科学研究院有限责任公司,广州 510663
    2.广东省电力系统网络安全企业重点实验室,广州 510663
    3.西安交通大学 网络空间安全学院,西安 710049
  • 出版日期:2024-10-15 发布日期:2024-10-15

Intelligent Fuzzing Technology Based on Combination Model of Multiple Reinforcement Learning Algorithms

XU Aidong, XU Peiming, SHANG Jin, SUN Qindong   

  1. 1.China Southern Power Grid CSG Electric Power Research Institute, Guangzhou 510663, China
    2.Guangdong Provincial Key Laboratory of Power System Network Security, Guangzhou 510663, China
    3.School of Cyber Security, Xi’an Jiaotong University, Xi’an 710049, China
  • Online:2024-10-15 Published:2024-10-15

摘要: 随着物联网技术的发展,物联网智能终端得到普及。当前物联网终端固件中存在大量安全漏洞,使用人工的方法对物联网终端设备进行漏洞检测存在极大不便性。目前多采用基于遗传算法的智能化模糊测试技术,使用随机变异数据对待测固件进行自动化测试。针对现有基于遗传算法的模糊测试技术存在的效率低下问题,提出了一种基于多强化学习算法组合的智能化模糊测试模型。该模型利用强化学习算法优化模糊测试变异算子选择策略,通过对不同测试用例智能化选择不同变异算子的方式提高了模糊测试代码覆盖率。通过在LAVA数据集上进行对比实验,综合比较了DDQN、DDPG、TRPO及PPO算法在模型中的表现情况,并与传统模糊测试方法进行比较,结果表明在模糊测试环境下,对于不同的目标程序,不同算法性能存在显著差异,同时基于强化学习的模糊测试方法明显优于传统模糊测试方法,证明了所提模型的可用性及有效性。

关键词: 物联网终端, 强化学习, 模糊测试, 漏洞发现

Abstract: With the development of Internet of things technology, intelligent terminals of the Internet of things have gained popularity. At present, there are many security vulnerabilities in the firmware of the Internet of things terminal, and it is very inconvenient to use manual methods to detect the vulnerabilities of the Internet of things terminal equipment. The intelligent fuzzing technology based on genetic algorithms is mainly used, and the firmware to be tested is automatically tested using random variation data. Aiming at the low efficiency of the existing fuzzing technology based on genetic algorithms, this paper proposes an intelligent fuzzing model based on multiple reinforcement learning algorithms. In this model, reinforcement learning algorithms are used to optimize the mutation operator selection strategy of fuzzing and the code coverage of fuzzing is improved by intelligently selecting different mutation operators for different test cases. This paper compares the performance of DDQN, DDPG, TRPO, and PPO algorithms in the model through comparative experiments on LAVA datasets and traditional fuzzing methods. The results show that in the fuzzing environment, there are significant differences in the performance of different algorithms for different target programs and the fuzzing method based on reinforcement learning is obviously superior to the traditional fuzzing method, proving the proposed model’s availability and effectiveness.

Key words: Internet of things terminal, reinforcement learning, fuzzing, vulnerability discovery