引用本文:何明,张斌,柳强,等.MADDPG算法经验优先抽取机制[J].控制与决策,2021,36(1):68-74
【打印本页】   【HTML】   【下载PDF全文】   查看/发表评论  【EndNote】   【RefMan】   【BibTex】
←前一篇|后一篇→ 过刊浏览    高级检索
本文已被:浏览次   下载 本文二维码信息
码上扫一扫!
分享到: 微信 更多
MADDPG算法经验优先抽取机制
何明1, 张斌1, 柳强2, 陈希亮1, 杨铖1
(1. 中国人民解放军陆军工程大学指挥控制工程学院,南京210007;2. 海军指挥学院,南京210000)
摘要:
针对多智能体深度确定性策略梯度算法(MADDPG)学习训练效率低、收敛速度慢的问题,研究MADDPG算法经验优先抽取机制,提出PES-MADDPG算法.首先,分析MADDPG算法的模型和训练方法;然后,改进多智能体经验缓存池,以策略评估函数误差和经验抽取训练频率为依据,设计优先级评估函数,以优先级作为抽取概率获取学习样本训练神经网络;最后,在合作导航和竞争对抗2类环境中进行6组对比实验,实验结果表明,经验优先抽取机制可提高MADDPG算法的训练速度,学习后的智能体具有更好的表现,同时对深度确定性策略梯度算法(DDPG)控制的多智能体训练具有一定的适用性.
关键词:  多智能体  深度强化学习  MADDPG  经验优先抽取
DOI:10.13195/j.kzyjc.2019.0834
分类号:TP273
基金项目:国家重点研发计划项目(2018YFC0806900,2016YFC0800606,2016YFC0800310);江苏省自然科学基金项目(BK20161469);江苏省重点研发计划项目(BE2016904,BE2017616,BE2018754);中国博士后基金项目(2018M633757).
Multi-agent deep deterministic policy gradient algorithm via prioritized experience selected method
HE Ming1,ZHANG Bin1,LIU Qiang2,CHEN Xi-liang1,YANG Cheng1
(1. College of Command and Control Engineering,The Army Engineering University of PLA,Nanjing210007,China;2. Naval Command College,Nanjing210000,China)
Abstract:
In order to mitigate the problem of low efficiency and slow convergence of the multi-agent deep deterministic policy gradient(MADDPG) algorithm, the prioritized experience selection mechanism of MADDPG algorithm is studied and PES-MADDPG algorithm is proposed. Firstly, the model and the training method of the MADDPG algorithm are analyzed, the multi-agent experience buffer pool is ameliorated, and the priority evaluation function is designed based on the error of critic function and the training frequency of experience. The priority is treated as the selection probability to obtain the learning sample for training neural network. Finally, six groups of comparative experiments are conducted in both cooperative navigation and competitive environment. The experiments results show that the prioritized experience selection mechanism improves the training speed of the MADDPG algorithm, and the trained agents have better performance. The prioritized experience selection mechanism also has certain applicability to the training of multi-agents controlled by the deep detcrministic policy gradient(DDPG) algorithm.
Key words:  multi-agent  deep reinforcement learning  MADDPG  prioritized experience selected method

用微信扫一扫

用微信扫一扫