RISE-D3QN-Based Path Planning for Multi-UAV Data Collection

doi:10.3778/j.issn.1002-8331.2306-0311

Abstract

Abstract: Unmanned aerial vehicles (UAVs) assisted Internet of things (IoT) data collection is an efficient and promising approach. The optimization of resource allocation in path planning is addressed in this paper by refining the energy consumption model and considering three metrics：the amount of collected data, time efficiency, and energy efficiency. The problem is formulated as a distributed partially observable Markov decision process (POMDP) and a novel deep reinforcement learning algorithm called RISE (Rényi state entropy)-D3QN (dueling double deep Q network) is proposed. It combines intrinsic rewards, prioritized experience replay, and soft-max exploration strategy, enabling path planning for UAV swarms while adapting to changes in UAV battery capacity, IoT device locations, data volume, and quantity. Simulation results demonstrate that compared to traditional D3QN and DQN algorithms, the proposed approach significantly increases. the amount of collected data from IoT devices while reducing UAV flight time and energy consumption, all while ensuring UAV safety during flight.

Key words: unmanned aerial vehicles (UAVs), path planning, deep reinforcement learning, multi-agent, Internet of things (IoT), data collection

摘要： 无人机辅助物联网数据采集是高效且具有前景的方法。针对路径规划的优化资源分配问题，细化了电量消耗模型，并考虑了三个指标：数据量、时间效率和能源效率。该问题被建模为分布式局部可观测马尔可夫决策过程，并提出一种深度强化学习算法。具体地，将归一化的模型分为四个具体地的无人机电量消耗模型；基于离散动作离线深度强化学习架构，提出一种新的RISE（Rényi state entropy）-D3QN（dueling double deep Q network）算法，结合了内在奖励、优先经验回放和soft-max探索策略，可在无人机电池容量、物联网设备位置、物联网设备数据量、物联网设备数量发生变化的同时规划无人机群的路径。仿真结果表明，相比于传统的D3QN算法以及传统的DQN算法，在确保无人机安全飞行的同时，提高了无人机从物联网设备采集的数据量，并在以此为主要目标的情况下减少了无人机的飞行时间以及能量消耗。

关键词: 无人机, 路径规划, 深度强化学习, 多智能体, 物联网, 数据采集

HUANG Zefeng, LI Tao. RISE-D3QN-Based Path Planning for Multi-UAV Data Collection[J]. Computer Engineering and Applications, 2024, 60(20): 328-338.

黄泽丰, 李涛. RISE-D3QN驱动的多无人机数据采集路径规划[J]. 计算机工程与应用, 2024, 60(20): 328-338.

References

[1] DELAFONTAINE V, SCHIANO F, COCCO G, et al. Drone-aided localization in LoRa IoT networks[C]//Proceedings of the 2020 IEEE International Conference on Robotics and Automation, 2020: 286-292.
[2] AHANSAL Y, BOUZIANI M, YAAGOUBI R, et al. Towards smart irrigation: a literature review on the use of geospatial technologies and machine learning in the management of water resources in arboriculture[J]. Agronomy, 2022, 12(2): 297.
[3] ADAM A B M, MUTHANNA M S A, MUTHANNA A, et al. Toward smart traffic management with 3D placement optimization in UAV-Assisted NOMA IIoT networks[J]. IEEE Transactions on Intelligent Transportation Systems, 2023, 24(12): 15448-15458.
[4] HU Z, BAI Z, YANG Y, et al. UAV aided aerial-ground IoT for air quality sensing in smart city: architecture, technologies, and implementation [J]. IEEE Network, 2019, 33(2): 14-22.
[5] 黄志滨, 陈桪. 基于PSO-GA算法的无人机集群森林火灾探查方法[J]. 计算机工程与应用, 2023, 59(9): 289-294.
HUANG Z B, CHEN X. UAV cluster forest fire detection method based on PSO-GA algorithm[J]. Computer Engineering and Applications, 2023, 59(9): 289-294.
[6] 路世昌, 邵旭伦, 李丹. 卡车-无人机协同救灾物资避障配送问题研究[J]. 计算机工程与应用, 2023, 59(2): 289-298.
LU S C, SHAO X L, LI D. Research on truck-drone coordinated disaster relief supplies obstacle avoidance distribution[J]. Computer Engineering and Applications, 2023, 59(2): 289-298.
[7] 赵强柱, 卢福强, 王雷震, 等. 无人机骑手联合外卖配送路径优化问题研究[J]. 计算机工程与应用, 2022, 58(11): 269-278.
ZHAO Q Z, LU F Q, WANG L Z, et al. Research on drones and riders joint take-out delivery routing problem[J]. Computer Engineering and Applications, 2022, 58(11): 269-278.
[8] 沈凡凡, 杨博帆, 梁琦玮, 等. 基于深度强化学习的无人机矿井自主巡航研究[J]. 武汉大学学报 (理学版), 2023, 69(2): 205-214.
SHEN F F, YANG B F, LIANG Q W, et al. Research on autonomous mine cruise of UAV based on deep reinforcement learning[J]. Journal of Wuhan University (Natural Science Edition), 2023, 69(2): 205-214.
[9] 谢芳. 基于物联网和人工智能的农业无人机路径规划系统[J]. 农机化研究, 2023, 45(6): 30-33.
XIE F. Path planning system of agricultural UAV based on Internet of things and artificial intelligence[J]. Journal of Agricultural Mechanization Research, 2023, 45(6): 30-33.
[10] LIU L, XIONG K, CAO J, et al. Average AoI minimization in UAV-assisted data collection with rf wireless power transfer: a deep reinforcement learning scheme[J]. IEEE Internet of Things Journal, 2022, 9(7): 5216-5228.
[11] 付澍, 杨祥月, 张海君, 等. 物联网数据收集中无人机路径智能规划[J]. 通信学报, 2021, 42(2): 124-133.
FU S, YANG X Y, ZHANG H J, et al. UAV path intelligent planning in IoT data collection[J]. Journal on Communications, 2021, 42(2): 124-133.
[12] 张建行, 康凯, 钱骅, 等. 面向物联网的深度Q网络无人机路径规划[J]. 电子与信息学报, 2022, 44(11): 3850-3857.
ZHANG J H, KANG K, QIAN H, et al. UAV trajectory planning based on deep Q-network for Internet of things[J]. Journal of Electronics & Information Technology, 2022, 44(11): 3850-3857.
[13] 牟治宇, 张煜, 范典, 等. 基于深度强化学习的无人机数据采集和路径规划研究 [J]. 物联网学报, 2020, 4(3): 42-51.
MOU Z Y, ZHANG Y, FAN D, et al. Research on the UAV-aided data collection and trajectory design based on the deep reinforcement learning[J]. Chinese Journal on Internet of Things, 2020, 4(3): 42-51.
[14] BAYERLEIN H, THEILE M, CACCAMO M, et al. Multi-UAV path planning for wireless data harvesting with deep reinforcement learning[J]. IEEE Open Journal of the Communications Society, 2021, 2: 1171-1187.
[15] KHAMIDEHI B, SOUSA E S. Reinforcement-learning-aided safe planning for aerial robots to collect data in dynamic environments[J]. IEEE Internet of Things Journal, 2022, 9(15): 13901-13912.
[16] YUAN M, PUN M O, WANG D. Rényi state entropy maximization for exploration acceleration in reinforcement learning[J]. IEEE Transactions on Artificial Intelligence, 2022: 1-11.
[17] MOZAFFARI M, SAAD W, BENNIS M, et al. Mobile unmanned aerial vehicles (UAVs) for energy-efficient Internet of Things communications[J]. IEEE Transactions on Wireless Communications, 2017, 16(11): 7574-7589.
[18] FILIPPONE A. Flight performance of fixed and rotary wing aircraft[M]. Amsterdam: Elsevier, 2006.
[19] LIU Q, SHI L, SUN L, et al. Path planning for UAV-mounted mobile edge computing with deep reinforcement learning [J]. IEEE Transactions on Vehicular Technology, 2020, 69(5): 5723-5728.
[20] WU Q, ZENG Y, ZHANG R. Joint trajectory and communication design for multi-UAV enabled wireless networks[J]. IEEE Transactions on Wireless Communications, 2018, 17(3): 2109-2121.
[21] THEILE M, BAYERLEIN H, NAI R, et al. UAV path planning using global and local map information with deep reinforcement learning[C]//Proceedings of the 2021 20th International Conference on Advanced Robotics, 2021: 539-546.
[22] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Playing atari with deep reinforcement learning[J]. arXiv:1312.5602, 2013.
[23] HASSELT V H, GUEZ A, SILVER D. Deep reinforcement learning with double Q-learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2016: 2094-2100.
[24] SCHAUL T, QUAN J, ANTONOGLOU I, et al. Prioritized experience replay[J]. arXiv:1511.05952, 2015.
[25] WANG Z, SCHAUL T, HESSEL M, et al. Dueling network architectures for deep reinforcement learning[C]//Proceedings of the 33rd International Conference on Machine Learning, 2016: 1995-2003.
[26] HAARNOJA T, TANG H, ABBEEL P, et al. Reinforcement learning with deep energy-based policies[C]//Proceedings of the 34th International Conference on Machine Learning, 2017: 1352-1361.