基于强化学习的多目标车辆跟随决策算法
作者:
作者单位:

西南交通大学信息科学与技术学院

作者简介:

通讯作者:

中图分类号:

TP273

基金项目:

浙江大学CAD&CG国家重点实验室开放课题(A1923);成都市科技项目(2015-HM01-00050-SF)


Multi-objective vehicle following decision algorithm based on reinforcement learning
Author:
Affiliation:

School of Information Science and Technology, Southwest Jiaotong University

Fund Project:

Zhejiang University CAD & CG National Focused Laboratory Open Course Questions(A1923); Chengdu Science and Technology Project(2015-HM01-00050-SF)

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    为满足自适应巡航系统跟车模式下的舒适性需求并兼顾车辆安全性和行车效率,基于深度确定性策略梯 度算法(Deep Deterministic Policy Gradient, DDPG),提出了一种新的多目标车辆跟随决策算法,解决了已有算法泛 化性和舒适性差的问题。根据跟随车辆与领航车辆的相互纵向运动学特性,建立了车辆跟随过程的马尔可夫决策 过程(Markov Decision Process, MDP)模型,设计了一个高效、舒适、安全的车辆跟随决策算法。为提高模型收敛速 度,改进了 DDPG 算法经验样本的存储方式和抽取策略。针对跟车过程的多目标结构,对奖赏函数进行了模块化 设计。最后,在仿真环境下进行测试,在测试环境和训练环境不同时,依然能顺利完成跟随任务,且性能优于已 有跟随算法。

    Abstract:

    To meet the comfort requirements of the adaptive cruise control system following mode and take into account vehicle safety and driving efficiency, a multi-target vehicle following decision algorithm is proposed based on Deep Deterministic Policy Gradient (DDPG). The algorithm solves the problem of poor generalization and comfort of existing algorithms. According to the mutual longitudinal kinematics characteristics of the following vehicle and the pilot vehicle, a Markov Decision Process (MDP) model of the vehicle following process is established, and an efficient, comfortable and safe vehicle following decision algorithm is designed. In order to improve the model convergence speed, the storage method and extraction strategy of DDPG algorithm experience sample are improved. Aiming at the multi-objective structure of the following process, the reward function is modularized. Finally, the test was done in the simulation environment. When the test environment and the training environment are different, the following tasks can still be successfully completed, and the performance is better than the existing following algorithms.

    参考文献
    相似文献
    引证文献
引用本文
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2020-04-15
  • 最后修改日期:2020-06-08
  • 录用日期:2020-06-12
  • 在线发布日期:
  • 出版日期: