School of Information Science and Technology, Southwest Jiaotong University
Zhejiang University CAD & CG National Focused Laboratory Open Course Questions(A1923); Chengdu Science and Technology Project(2015-HM01-00050-SF)
为满足自适应巡航系统跟车模式下的舒适性需求并兼顾车辆安全性和行车效率，基于深度确定性策略梯 度算法(Deep Deterministic Policy Gradient, DDPG)，提出了一种新的多目标车辆跟随决策算法，解决了已有算法泛 化性和舒适性差的问题。根据跟随车辆与领航车辆的相互纵向运动学特性，建立了车辆跟随过程的马尔可夫决策 过程(Markov Decision Process, MDP)模型，设计了一个高效、舒适、安全的车辆跟随决策算法。为提高模型收敛速 度，改进了 DDPG 算法经验样本的存储方式和抽取策略。针对跟车过程的多目标结构，对奖赏函数进行了模块化 设计。最后，在仿真环境下进行测试，在测试环境和训练环境不同时，依然能顺利完成跟随任务，且性能优于已 有跟随算法。
To meet the comfort requirements of the adaptive cruise control system following mode and take into account vehicle safety and driving efficiency, a multi-target vehicle following decision algorithm is proposed based on Deep Deterministic Policy Gradient (DDPG). The algorithm solves the problem of poor generalization and comfort of existing algorithms. According to the mutual longitudinal kinematics characteristics of the following vehicle and the pilot vehicle, a Markov Decision Process (MDP) model of the vehicle following process is established, and an efficient, comfortable and safe vehicle following decision algorithm is designed. In order to improve the model convergence speed, the storage method and extraction strategy of DDPG algorithm experience sample are improved. Aiming at the multi-objective structure of the following process, the reward function is modularized. Finally, the test was done in the simulation environment. When the test environment and the training environment are different, the following tasks can still be successfully completed, and the performance is better than the existing following algorithms.