宇航学报 ›› 2021, Vol. 42 ›› Issue (10): 1293-1304.doi: 10.3873/j.issn.1000-1328.2021.10.010

• 制导、导航、控制与电子 • 上一篇    下一篇

一种深度强化学习制导控制一体化算法

裴培,何绍溟,王江,林德福   

  1. 1.北京理工大学宇航学院,北京 100081;2. 北京理工大学无人飞行器自主控制研究所,北京 100081
  • 收稿日期:2020-10-19 修回日期:2021-01-20 出版日期:2021-10-15 发布日期:2021-10-15

Integrated Guidance and Control for Missile Using Deep Reinforcement Learning

PEI Pei, HE Shao ming, WANG Jiang, LIN De fu   

  1. 1. School of Aerospace Engineering, Beijing Institute of Technology, Beijing 100081,China;2. Beijing Key Laboratory of UAV Autonomous Control, Beijing Institute of Technology, Beijing 100081,China
  • Received:2020-10-19 Revised:2021-01-20 Online:2021-10-15 Published:2021-10-15

摘要: 研究了一种基于深度强化学习理论的制导控制一体化算法。不同于传统的制导控制一体化算法和制导控制回路分开设计的方法,基于深度强化学习理论的制导控制一体化算法利用深度学习强化算法生成一个智能体,智能体根据导弹的观测量生成舵偏角控制指令准确拦截目标。首先将制导控制问题转化为一个马尔可夫决策过程,然后提出了一个权衡制导精度、能量损耗和飞行时间的奖励函数,将制导控制问题转化到强化学习问题的框架中。最后采用深度确定性策略梯度算法,求解提出的强化学习问题,训练得到制导控制智能体,智能体根据导弹观测量生成舵偏角指令。通过进行大量的数值模拟,验证了提出的制导控制一体化算法的有效性和鲁棒性。


关键词: 制导控制一体化, 深度强化学习, 深度确定性策略梯度, 零控脱靶量, 启发式学习

Abstract: This paper proposes an integrated guidance and control algorithm based on deep reinforcement learning technique. Differently from the traditional integrated guidance and control algorithm and designing the guidance loop and control loop separately, the fin deflection command of proposed integrated guidance and control algorithm is given by the agent through the observation states of missile. The agent is generated by the deep reinforcement learning. To utilize the deep reinforcement learning technique in integrated guidance and control problem, we transfer the integrated guidance and control problem into a Markovian decision process that enables the application of reinforcement learning theory. A heuristic way is utilized to shape a proper reward function that has tradeoff between guidance accuracy, energy consumption and interception time. The state of the art deep deterministic policy gradient algorithm is utilized to learn an action policy that maps the observation states to a fin deflection command. Extensive empirical numerical simulations are performed to validate the effectiveness and robustness of proposed integrated guidance and control algorithm.


Key words: Integrated guidance and control, Deep reinforcement learning, Deep deterministic policy gradient, Zero effort miss, Heuristic learning

中图分类号: