Skip to main navigation Skip to search Skip to main content

ELAPSE: Expand Latent Action Projection Space for policy optimization in Offline Reinforcement Learning

Research output: Contribution to journalArticlepeer-review

Abstract

Offline Reinforcement Learning (RL) is dedicated to optimizing policy based on static datasets. Most offline RL algorithms optimize the policy to be close to behavior policy to avoid extrapolation errors caused by out-of-distribution actions. Therefore, Conditional Variational AutoEncoders (CVAE), which enable the approximation of the underlying behavior policy distribution based on the dataset, have been integrated into recent offline RL methods. However, there is an inherent issue of posterior collapse in the CVAE model. In this paper, we first define the latent action projection space collapse issue in offline RL. Then, we theoretically explain how the issue of collapse can lead to the deterioration of policy optimization and experimentally visualize this phenomenon. Therefore, a novel and easy-to-implement algorithm is proposed, which expands the latent action projection space through Batch Normalization. We call this algorithm Expand Latent Action Projection SpacE (ELAPSE), which balances the trade-off between the CVAE constraint and policy optimization, and avoids unnecessary perturbations that were present in previous works. Experiments prove that ELAPSE can effectively improve offline RL performance and demonstrate state-of-the-art results on the D4RL benchmarks. Our implementation is available on https://github.com/Mr-XcHan/ELAPSE.

Original languageEnglish
Article number129665
JournalNeurocomputing
Volume631
DOIs
Publication statusPublished - 28 May 2025

Keywords

  • Conditional Variational AutoEncoders
  • Latent action space
  • Offline Reinforcement Learning
  • Policy optimization

Fingerprint

Dive into the research topics of 'ELAPSE: Expand Latent Action Projection Space for policy optimization in Offline Reinforcement Learning'. Together they form a unique fingerprint.

Cite this