引用本文:陈亮,梁宸,张景异,等.Actor-Critic框架下一种基于改进DDPG的多智能体强化学习算法[J].控制与决策,2021,36(1):75-82
【打印本页】   【HTML】   【下载PDF全文】   查看/发表评论  【EndNote】   【RefMan】   【BibTex】
←前一篇|后一篇→ 过刊浏览    高级检索
本文已被:浏览次   下载 本文二维码信息
码上扫一扫!
分享到: 微信 更多
Actor-Critic框架下一种基于改进DDPG的多智能体强化学习算法
陈亮,梁宸,张景异,刘韵婷
(沈阳理工大学自动化与电气工程学院,沈阳110159)
摘要:
现实世界的人工智能应用通常需要多个agent协同工作,人工agent之间有效的沟通和协调是迈向通用人工智能不可或缺的一步.以自主开发的警员训练虚拟环境为测试场景,设定任务需要多个不同兵种agent小队互相协作或对抗完成.为保证沟通方式有效且可扩展,提出一种混合DDPG(Mi-DDPG)算法.首先,在Actor网络加入双向循环神经网络(BRNN)作为同兵种agent信息交流层;然后,在Critic网络加入其他兵种agent信息来学习多agent协同策略.另外,为了缓解训练压力,采用集中训练,分散执行的框架,同时对Critic网络里的Q函数进行模块化处理.实验中,在不同的场景下用Mi-DDPG算法与其他算法进行对比,Mi-DDPG在收敛速度和任务完成度方面有明显提高,具有在现实世界应用的潜在价值.
关键词:  强化学习  深度学习  多智能体  RNN  DDPG  Actor-Critic
DOI:10.13195/j.kzyjc.2019.0787
分类号:TP181
基金项目:国家重点研发计划项目(2017YFC0821004,2017YFC0821001);辽宁省自然科学基金项目(20170540788);辽宁省教育厅基本科研项目(LG201707).
A multi-agent reinforcement learning algorithm based on improved DDPG in Actor-Critic framework
CHEN Liang,LIANG Chen,ZHANG Jing-yi,LIU Yun-ting
(College of Automation and Electrical Engineering,Shenyang Ligong University,Shenyang110159,China)
Abstract:
Real-world artificial intelligence (AI) applications often require multiple agents to work together, and effective communication and coordination between artificial agents is an indispensable step toward universal artificial intelligence. This paper takes the self-developed virtual environment for police training as a test scenario. Setting tasks requires multiple different service agent teams to cooperate or fight against each other. In order to ensure that the communication method is effective and scalable, this paper proposes the mixed deep deterministic policy gradient (Mi-DDPG) algorithm. Firstly, the bidirectional recurrent neural networks (BRNN) is added to the Actor network as the information exchange layer of the same type of agent, and then the other agent information is added to the Critic network to learn the multi-agent cooperation strategy. In addition, in order to alleviate the training pressure, the centralized training and distributed execution framework are adopted, and the Q function in the Critic network is modularized. In the experiment, the Mi-DDPG algorithm is compared with other algorithms in different scenarios, which shows its most advanced performance and potential value in real-world.
Key words:  reinforcement learning  deep learning  multi-agent  RNN  DDPG  Actor-Critic

用微信扫一扫

用微信扫一扫