宇航学报 ›› 2021, Vol. 42 ›› Issue (6): 757-765.doi: 10.3873/j.issn.1000-1328.2021.06.009

• 制导、导航、控制与电子 • 上一篇    下一篇

基于MADDPG的多无人机协同任务决策

李波,越凯强,甘志刚,高佩忻   

  1. 西北工业大学电子信息学院,西安 710114
  • 收稿日期:2020-07-17 修回日期:2020-11-04 出版日期:2021-06-15 发布日期:2021-07-23
  • 基金资助:
    国家自然科学基金(61573285,62003267);陕西省自然科学基金(2020JQ 220);航空科学基金(2017ZC53021);数据链技术重点实验室开放基金(CLDL 20182101)

Multi UAV Cooperative Autonomous Navigation Based on  Multi agent Deep Deterministic Policy Gradient

LI Bo, YUE Kai qiang, GAN Zhi gang, GAO Pei xin   

  1. School of Electronics and Information, Northwestern Polytechnical University, Xi’an 710114, China
  • Received:2020-07-17 Revised:2020-11-04 Online:2021-06-15 Published:2021-07-23

摘要: 针对多无人机任务决策方法研究中传统优化算法难以在短时间内得到期望结果的问题,基于深度强化学习提出一种无人机多智能体深度确定性策略梯度(MADDPG)算法,通过允许无人机在学习时使用全局信息,在应用决策的时候只使用局部信息的方法,从网络结构、状态空间、动作空间和奖励函数设计了MADDPG算法的模型结构。最后通过仿真实验,并对比深度确定性策略梯度(DDPG)算法,验证了本文提出的MADDPG算法在保证精度的基础上,学习速度大幅度提高,弥补了传统强化学习算法在多智能体领域的不足。


关键词: 无人机, 任务决策, 深度强化学习, 策略梯度, 多智能体

Abstract: Aiming at the problem that the traditional optimization algorithm is difficult to get the desired results in a short time in the research of multi UAV (unmanned aerial vehicle) task decision making method, this paper proposes a multi agent deep deterministic policy gradient (MADDPG) algorithm based on deep reinforcement learning. It allows UAVs to use global information in learning and only local information in application decision making. The model structure of MADDPG algorithm is designed. Finally, through simulation experiments and comparing with deep deterministic policy gradient (DDPG) algorithm, it is verified that the MADDPG algorithm proposed in this paper can greatly improve the learning speed on the basis of ensuring the accuracy, and make up for the shortcomings of the traditional reinforcement learning algorithm in the field of multiple agents. 


Key words: UAV, Task decision making, Deep reinforcement learning, Policy gradient, Multi agent

中图分类号: