计算机工程与应用 ›› 2026, Vol. 62 ›› Issue (8): 168-175.DOI: 10.3778/j.issn.1002-8331.2503-0329

• 模式识别与人工智能 • 上一篇    下一篇

多死角场景中机器人深度强化学习导航

曹青跃1,2,王雅栋1,2,王庆1,2,张羽佳1,2,阳媛1,2+   

  1. 1.东南大学 仪器科学与工程学院,南京 210096
    2.微惯性仪表与先进导航技术教育部重点实验室,南京 210096
    + 通信作者 E-mail:yangyuancsi@163.com
  • 收稿日期:2025-03-27 修回日期:2025-05-09 在线发布日期:2026-04-15 出版日期:2026-04-15
  • 基金资助:
    国家自然科学基金(42074039);中交建筑集团科技项目(8522008786)。

Robot Deep Reinforcement Learning Navigation in Multi-Corner Scenarios

CAO Qingyue1,2, WANG Yadong1,2, WANG Qing1,2, ZHANG Yujia1,2, YANG Yuan1,2+   

  1. 1.School of Instrument Science and Engineering, Southeast University, Nanjing 210096, China
    2.Ministry of Education Key Laboratory of Micro-Inertial Instrument and Advanced Navigation Technology, Nanjing 210096, China
    + Corresponding author E-mail:yangyuancsi@163.com
  • Received:2025-03-27 Revised:2025-05-09 Online:2026-04-15 Published:2026-04-15

摘要: 针对强化学习导航存在训练效率低、稳定性差及在多死角场景中目标遮挡时导航性能差的问题,对此提出了一种融合专家经验与混合奖励机制的深度强化学习导航方法。筛选高质量专家经验并基于此预训练了行为克隆模型初始化策略,用于提高训练效率;构建了包含死角避免约束的稠密奖励函数,实现目标牵引与死角避让之间的平衡;采用标准化折扣回报方式降低不同轨迹的回报方差以提高训练稳定性。仿真实验表明所提出的方法在随机起止点测试中取得了91.3%导航成功率,在固定起止点测试中取得了95%导航成功率且耗时最短。结果表明,该方法能够灵活调整目标牵引和死角避让策略,有效提高多死角场景中机器人自主导航水平。

关键词: 深度强化学习(DRL), 自主导航, 目标遮挡, 死角避免

Abstract: Aiming at the problems of low training efficiency, poor stability, and poor navigation performance in multi-corner scenarios with target occlusion in reinforcement learning-based navigation methods, a deep reinforcement learning navigation method that integrates expert experience and a hybrid reward mechanism is proposed. A behavior cloning model is pretrained based on the filtered high-quality expert experience to initialize the policy and improve training efficiency. A dense reward function with corner avoidance constraints is constructed to balance target traction and corner avoidance. Normalized discounted returns are used to reduce return variance across different trajectories and enhance training stability. Simulation experiments show that the proposed method achieves a 91.3% navigation success rate in the random start stop test, and a 95% success rate in the fixed start stop test with the shortest time consumption. The results show that this method can flexibly adjust the target traction and corner avoidance strategies, and effectively improve the robot autonomous navigation level in multi-corner scenarios.

Key words: deep reinforcement learning (DRL), autonomous navigation, target occlusion, corner avoidance