Passer à la navigation principale Passer à la recherche Passer au contenu principal

The Value of Reward Lookahead in Reinforcement Learning

  • ENSAE

Résultats de recherche: Contribution à un journalArticle de conférenceRevue par des pairs

Résumé

In reinforcement learning (RL), agents sequentially interact with changing environments while aiming to maximize the obtained rewards. Usually, rewards are observed only after acting, and so the goal is to maximize the expected cumulative reward. Yet, in many practical settings, reward information is observed in advance - prices are observed before performing transactions; nearby traffic information is partially known; and goals are oftentimes given to agents prior to the interaction. In this work, we aim to quantifiably analyze the value of such future reward information through the lens of competitive analysis. In particular, we measure the ratio between the value of standard RL agents and that of agents with partial future-reward lookahead. We characterize the worst-case reward distribution and derive exact ratios for the worst-case reward expectations. Surprisingly, the resulting ratios relate to known quantities in offline RL and reward-free exploration. We further provide tight bounds for the ratio given the worst-case dynamics. Our results cover the full spectrum between observing the immediate rewards before acting to observing all the rewards before the interaction starts.

langue originaleAnglais
journalAdvances in Neural Information Processing Systems
Volume37
étatPublié - 1 janv. 2024
Modification externeOui
Evénement38th Conference on Neural Information Processing Systems, NeurIPS 2024 - Vancouver, Canada
Durée: 9 déc. 202415 déc. 2024

Empreinte digitale

Examiner les sujets de recherche de « The Value of Reward Lookahead in Reinforcement Learning ». Ensemble, ils forment une empreinte digitale unique.

Contient cette citation