State Prediction for Offline Reinforcement Learning via Sequence-to-Sequence Modeling

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Recent offline reinforcement learning methods often frame the problem as a sequence modeling task, employing a decoder-only architecture to process states, actions, and a single scalar value representing the sum of future rewards (i.e., returns). However, the distinct characteristics of these modalities, such as the non-smoothness of action sequences and the scalar nature of returns, may hinder effective modeling and optimization when using a shared architecture. In this work, we propose a divide-and-conquer strategy, the Reward-Guided Decision Translator (RGDT), that leverages an encoder-decoder architecture by casting offline reinforcement learning as a sequence-to-sequence modeling problem. Our approach foregoes action prediction in favor of next state prediction, mitigating the challenges posed by the nonsmoothness of action sequences. Furthermore, our formulation enables direct conditioning of state generation on sequences of future returns, providing a more informative signal for the model. By disentangling the processing of different modalities, our approach addresses the limitations of shared decoder-only architectures. Empirical results demonstrate that our method significantly outperforms existing generative sequence modeling techniques and matches or surpasses state-of-the-art methods across a range of continuous control tasks from the D4RL benchmark.

Original languageEnglish
Title of host publication35th IEEE International Workshop on Machine Learning for Signal Processing
Subtitle of host publicationSignal Processing in the Age of Lorge Language Models, MLSP 2025
PublisherIEEE Computer Society
ISBN (Electronic)9798331570293
DOIs
Publication statusPublished - 1 Jan 2025
Event35th IEEE International Workshop on Machine Learning for Signal Processing, MLSP 2025 - Istanbul, Turkey
Duration: 31 Aug 20253 Sept 2025

Publication series

NameIEEE International Workshop on Machine Learning for Signal Processing, MLSP
ISSN (Print)2161-0363
ISSN (Electronic)2161-0371

Conference

Conference35th IEEE International Workshop on Machine Learning for Signal Processing, MLSP 2025
Country/TerritoryTurkey
CityIstanbul
Period31/08/253/09/25

Keywords

  • Offline Reinforcement Learning
  • Sequence Modeling
  • Transformer Architecture

Fingerprint

Dive into the research topics of 'State Prediction for Offline Reinforcement Learning via Sequence-to-Sequence Modeling'. Together they form a unique fingerprint.

Cite this