TY - JOUR
T1 - Near-Optimal Distributionally Robust Reinforcement Learning with General Lp Norms
AU - Clavier, Pierre
AU - Shi, Laixi
AU - Le Pennec, Erwan
AU - Mazumdar, Eric
AU - Wierman, Adam
AU - Geist, Matthieu
N1 - Publisher Copyright:
© 2024 Neural information processing systems foundation. All rights reserved.
PY - 2024/1/1
Y1 - 2024/1/1
N2 - To address the challenges of sim-to-real gap and sample efficiency in reinforcement learning (RL), this work studies distributionally robust Markov decision processes (RMDPs) - optimize the worst-case performance when the deployed environment is within an uncertainty set around some nominal MDP. Despite recent efforts, the sample complexity of RMDPs has remained largely undetermined. While the statistical implications of distributional robustness in RL have been explored in some specific cases, the generalizability of the existing findings remains unclear, especially in comparison to standard RL. Assuming access to a generative model that samples from the nominal MDP, we examine the sample complexity of RMDPs using a class of generalized Lp norms as the'distance' function for the uncertainty set, under two commonly adopted sa-rectangular and s-rectangular conditions. Our results imply that RMDPs can be more sample-efficient to solve than standard MDPs using generalized Lp norms in both sa- and s-rectangular cases, potentially inspiring more empirical research. We provide a near-optimal upper bound and a matching minimax lower bound for the sa-rectangular scenarios. For s-rectangular cases, we improve the state-of-the-art upper bound and also derive a lower bound using L∞ norm that verifies the tightness.
AB - To address the challenges of sim-to-real gap and sample efficiency in reinforcement learning (RL), this work studies distributionally robust Markov decision processes (RMDPs) - optimize the worst-case performance when the deployed environment is within an uncertainty set around some nominal MDP. Despite recent efforts, the sample complexity of RMDPs has remained largely undetermined. While the statistical implications of distributional robustness in RL have been explored in some specific cases, the generalizability of the existing findings remains unclear, especially in comparison to standard RL. Assuming access to a generative model that samples from the nominal MDP, we examine the sample complexity of RMDPs using a class of generalized Lp norms as the'distance' function for the uncertainty set, under two commonly adopted sa-rectangular and s-rectangular conditions. Our results imply that RMDPs can be more sample-efficient to solve than standard MDPs using generalized Lp norms in both sa- and s-rectangular cases, potentially inspiring more empirical research. We provide a near-optimal upper bound and a matching minimax lower bound for the sa-rectangular scenarios. For s-rectangular cases, we improve the state-of-the-art upper bound and also derive a lower bound using L∞ norm that verifies the tightness.
UR - https://www.scopus.com/pages/publications/105000537172
M3 - Conference article
AN - SCOPUS:105000537172
SN - 1049-5258
VL - 37
JO - Advances in Neural Information Processing Systems
JF - Advances in Neural Information Processing Systems
T2 - 38th Conference on Neural Information Processing Systems, NeurIPS 2024
Y2 - 9 December 2024 through 15 December 2024
ER -