TY - GEN
T1 - Leaky PPO
T2 - 2024 International Joint Conference on Neural Networks, IJCNN 2024
AU - Han, Xinchen
AU - Afifi, Hossam
AU - Moungla, Hassine
AU - Marot, Michel
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024/1/1
Y1 - 2024/1/1
N2 - Interest in applying Reinforcement Learning (RL) to Autonomous Vehicles (AVs) is experiencing a rapid and substantial expansion. Proximal Policy Optimization (PPO), a well-known RL algorithm with two versions, is simple to implement and has a high level of generality. In this paper, we first analyze the issues in each of the original PPO versions: asymmetric penalty in the Adaptive KL Penalty Coefficient PPO version, gradient loss and pessimistic estimate in the Clipped PPO version. Therefore, we propose three improved PPO algorithms: Adaptive JS Penalty Coefficient PPO, Leaky PPO, and Parametric PPO. To validate the effectiveness of the proposed algorithm, we generated three autonomous driving scenarios in the Metadrive simulator. Experimental results demonstrate that Leaky PPO outperforms the other five PPO variant algorithms in various autonomous driving simulation scenarios. Furthermore, we demonstrate that the Leaky PPO outperforms other popular RL algorithms and achieves state-of-the-art performance.
AB - Interest in applying Reinforcement Learning (RL) to Autonomous Vehicles (AVs) is experiencing a rapid and substantial expansion. Proximal Policy Optimization (PPO), a well-known RL algorithm with two versions, is simple to implement and has a high level of generality. In this paper, we first analyze the issues in each of the original PPO versions: asymmetric penalty in the Adaptive KL Penalty Coefficient PPO version, gradient loss and pessimistic estimate in the Clipped PPO version. Therefore, we propose three improved PPO algorithms: Adaptive JS Penalty Coefficient PPO, Leaky PPO, and Parametric PPO. To validate the effectiveness of the proposed algorithm, we generated three autonomous driving scenarios in the Metadrive simulator. Experimental results demonstrate that Leaky PPO outperforms the other five PPO variant algorithms in various autonomous driving simulation scenarios. Furthermore, we demonstrate that the Leaky PPO outperforms other popular RL algorithms and achieves state-of-the-art performance.
KW - Autonomous Vehicles
KW - Proximal Policy Opti-mization
KW - Reinforcement Learning
UR - https://www.scopus.com/pages/publications/85205011469
U2 - 10.1109/IJCNN60899.2024.10650450
DO - 10.1109/IJCNN60899.2024.10650450
M3 - Conference contribution
AN - SCOPUS:85205011469
T3 - Proceedings of the International Joint Conference on Neural Networks
BT - 2024 International Joint Conference on Neural Networks, IJCNN 2024 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 30 June 2024 through 5 July 2024
ER -