TY - GEN
T1 - When Can Sequence Modelling Approaches Recover the Target Policy In Offline Reinforcement Learning? a Statistical Analysis
AU - Ghanem, Abdelghani
AU - Ciblat, Philippe
AU - Ghogho, Mounir
N1 - Publisher Copyright:
© 2025 European Signal Processing Conference, EUSIPCO. All rights reserved.
PY - 2025/1/1
Y1 - 2025/1/1
N2 - We present a theoretical analysis of sample complexity for learning the target policy in offline reinforcement learning (RL) using sequence modeling approaches. Our main theorem establishes bounds on the minimum required number of high-return samples. We identify distinct small-data and large-data regimes, characterized by a critical transition point, and reveal a potential trade-off between context coverage breadth and sampling depth. These findings offer insights into efficient data collection strategies and algorithm design for offline RL.
AB - We present a theoretical analysis of sample complexity for learning the target policy in offline reinforcement learning (RL) using sequence modeling approaches. Our main theorem establishes bounds on the minimum required number of high-return samples. We identify distinct small-data and large-data regimes, characterized by a critical transition point, and reveal a potential trade-off between context coverage breadth and sampling depth. These findings offer insights into efficient data collection strategies and algorithm design for offline RL.
KW - Offline Reinforcement Learning
KW - Sample Complexity Analysis
KW - Sequence Modelling
UR - https://www.scopus.com/pages/publications/105029885067
U2 - 10.23919/EUSIPCO63237.2025.11226382
DO - 10.23919/EUSIPCO63237.2025.11226382
M3 - Conference contribution
AN - SCOPUS:105029885067
T3 - European Signal Processing Conference
SP - 1692
EP - 1696
BT - 2025 33rd European Signal Processing Conference, EUSIPCO 2025 - Proceedings
PB - European Signal Processing Conference, EUSIPCO
T2 - 33rd European Signal Processing Conference, EUSIPCO 2025
Y2 - 8 September 2025 through 12 September 2025
ER -