Fast Rates for Maximum Entropy Exploration

  • Daniil Tiapkin
  • , Denis Belomestny
  • , Daniele Calandriello
  • , Éric Moulines
  • , Rémi Munos
  • , Alexey Naumov
  • , Pierre Perrault
  • , Yunhao Tang
  • , Michal Valko
  • , Pierre Ménard

Research output: Contribution to journalConference articlepeer-review

Abstract

We address the challenge of exploration in reinforcement learning (RL) when the agent operates in an unknown environment with sparse or no rewards. In this work, we study the maximum entropy exploration problem of two different types. The first type is visitation entropy maximization previously considered by Hazan et al. (2019) in the discounted setting. For this type of exploration, we propose a game-theoretic algorithm that has Oe(H3S2A/ε2) sample complexity thus improving the ε-dependence upon existing results, where S is a number of states, A is a number of actions, H is an episode length, and ε is a desired accuracy. The second type of entropy we study is the trajectory entropy. This objective function is closely related to the entropy-regularized MDPs, and we propose a simple algorithm that has a sample complexity of order Oe(poly(S, A, H)/ε). Interestingly, it is the first theoretical result in RL literature that establishes the potential statistical advantage of regularized MDPs for exploration. Finally, we apply developed regularization techniques to reduce sample complexity of visitation entropy maximization to Oe(H2SA/ε2), yielding a statistical separation between maximum entropy exploration and reward-free exploration.

Original languageEnglish
Pages (from-to)34161-34221
Number of pages61
JournalProceedings of Machine Learning Research
Volume202
Publication statusPublished - 1 Jan 2023
Externally publishedYes
Event40th International Conference on Machine Learning, ICML 2023 - Honolulu, United States
Duration: 23 Jul 202329 Jul 2023

Fingerprint

Dive into the research topics of 'Fast Rates for Maximum Entropy Exploration'. Together they form a unique fingerprint.

Cite this