引用本文:薛晗,邵哲平,方琼林,等.基于强化学习的倒立摆分数阶梯度下降RBF控制[J].控制与决策,2021,36(1):125-134
【打印本页】   【HTML】   【下载PDF全文】   查看/发表评论  【EndNote】   【RefMan】   【BibTex】
←前一篇|后一篇→ 过刊浏览    高级检索
本文已被:浏览次   下载 本文二维码信息
码上扫一扫!
分享到: 微信 更多
基于强化学习的倒立摆分数阶梯度下降RBF控制
薛晗,邵哲平,方琼林,刘晓佳
(集美大学航海学院,福建厦门361021)
摘要:
为了提高强化学习的控制性能,提出一种基于分数梯度下降RBF神经网络的强化学习算法.通过评价神经网络和执行神经网络组成强化学习系统,利用神经网络记忆和联想,学会控制倒立摆,提高控制精度,使误差趋于零,直至学习成功,并证明闭环系统的稳定性.通过倒立摆的物理实验发现,当分数阶阶数较大,微分的作用更显著,对角速度和速度的控制效果更好,角速度和速度的均方误差和平均绝对误差较小;当分数阶阶数较小,积分的作用更显著,对倾斜角和位移的控制效果更好,因此倾斜角和位移的均方误差和平均绝对误差较小.仿真实验的结果表明,所提算法动态响应好,超调量小,调整时间短,精度高,泛化性能好.它优于基于RBF神经网络的强化学习算法和传统强化学习算法,能有效地加快梯度下降法的收敛速度,提高其控制性能.在引入适当的干扰后,所提算法能够快速地自我调节并恢复稳定状态,控制器的鲁棒性和动态性能满足实际要求.
关键词:  强化学习  径向基神经网络  倒立摆  分数阶  梯度下降  神经网络控制
DOI:10.13195/j.kzyjc.2019.0816
分类号:TP18
基金项目:国家自然科学基金项目(51579114);福建省自然科学基金项目(2018J05085).
Reinforcement learning based fractional gradient descent RBF neural network control of inverted pendulum
XUE Han,SHAO Zhe-ping,FANG Qiong-lin,LIU Xiao-jia
(Institute of Navigation,Jimei University,Xiamen361021,China)
Abstract:
In order to improve the control performance of reinforcement learning, a reinforcement learning algorithm based on the fractional gradient descent RBF neural network is proposed. Based on the evaluation neural network and action neural network, the reinforcement learning system uses neural network memory and association, and learns to control the inverted pendulum. The control accuracy is improved with the error tending to zero until the learning is successful. The stability of the closed-loop system is proved. The physical experiment of inverted pendulum is carried out. It is pointed that when the fractional order is large, the differential effect is more significant, the control effect of diagonal velocity and velocity is better, and the mean square error and mean absolute error of angular velocity and velocity are smaller. When the fractional order is small, the effect of integral is more significant, and the control effect on tilt angle and displacement is better. The results indicate that the algorithm has good dynamic response, small overshoot, short adjustment time, high precision and good generalization performance. It is superior to the reinforcement learning algorithm based on the RBF neural network and the traditional reinforcement learning algorithm. It can effectively accelerate the convergence speed of the gradient descent method and improve its control performance. After introducing appropriate disturbance, the controller can quickly self-adjust and recover the stable state. The robustness and dynamic performance of the controller meet the actual requirements.
Key words:  reinforcement learning  RBF neural network  inverted pendulum  fractional order  gradient descent  neural network control

用微信扫一扫

用微信扫一扫