Skip to main navigation Skip to search Skip to main content

When Can Sequence Modelling Approaches Recover the Target Policy In Offline Reinforcement Learning? a Statistical Analysis

  • International University of Rabat
  • Institut Polytechnique de Paris
  • Mohammed VI Polytechnic University

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

We present a theoretical analysis of sample complexity for learning the target policy in offline reinforcement learning (RL) using sequence modeling approaches. Our main theorem establishes bounds on the minimum required number of high-return samples. We identify distinct small-data and large-data regimes, characterized by a critical transition point, and reveal a potential trade-off between context coverage breadth and sampling depth. These findings offer insights into efficient data collection strategies and algorithm design for offline RL.

Original languageEnglish
Title of host publication2025 33rd European Signal Processing Conference, EUSIPCO 2025 - Proceedings
PublisherEuropean Signal Processing Conference, EUSIPCO
Pages1692-1696
Number of pages5
ISBN (Electronic)9789464593624
DOIs
Publication statusPublished - 1 Jan 2025
Event33rd European Signal Processing Conference, EUSIPCO 2025 - Palermo, Italy
Duration: 8 Sept 202512 Sept 2025

Publication series

NameEuropean Signal Processing Conference
ISSN (Print)2219-5491

Conference

Conference33rd European Signal Processing Conference, EUSIPCO 2025
Country/TerritoryItaly
CityPalermo
Period8/09/2512/09/25

Keywords

  • Offline Reinforcement Learning
  • Sample Complexity Analysis
  • Sequence Modelling

Fingerprint

Dive into the research topics of 'When Can Sequence Modelling Approaches Recover the Target Policy In Offline Reinforcement Learning? a Statistical Analysis'. Together they form a unique fingerprint.

Cite this