Passer à la navigation principale Passer à la recherche Passer au contenu principal

ELAPSE: Expand Latent Action Projection Space for policy optimization in Offline Reinforcement Learning

Résultats de recherche: Contribution à un journalArticleRevue par des pairs

Résumé

Offline Reinforcement Learning (RL) is dedicated to optimizing policy based on static datasets. Most offline RL algorithms optimize the policy to be close to behavior policy to avoid extrapolation errors caused by out-of-distribution actions. Therefore, Conditional Variational AutoEncoders (CVAE), which enable the approximation of the underlying behavior policy distribution based on the dataset, have been integrated into recent offline RL methods. However, there is an inherent issue of posterior collapse in the CVAE model. In this paper, we first define the latent action projection space collapse issue in offline RL. Then, we theoretically explain how the issue of collapse can lead to the deterioration of policy optimization and experimentally visualize this phenomenon. Therefore, a novel and easy-to-implement algorithm is proposed, which expands the latent action projection space through Batch Normalization. We call this algorithm Expand Latent Action Projection SpacE (ELAPSE), which balances the trade-off between the CVAE constraint and policy optimization, and avoids unnecessary perturbations that were present in previous works. Experiments prove that ELAPSE can effectively improve offline RL performance and demonstrate state-of-the-art results on the D4RL benchmarks. Our implementation is available on https://github.com/Mr-XcHan/ELAPSE.

langue originaleAnglais
Numéro d'article129665
journalNeurocomputing
Volume631
Les DOIs
étatPublié - 28 mai 2025

Empreinte digitale

Examiner les sujets de recherche de « ELAPSE: Expand Latent Action Projection Space for policy optimization in Offline Reinforcement Learning ». Ensemble, ils forment une empreinte digitale unique.

Contient cette citation