Skip to main navigation Skip to search Skip to main content

DEMONSTRATION-REGULARIZED RL

  • Daniil Tiapkin
  • , Denis Belomestny
  • , Daniele Calandriello
  • , Éric Moulines
  • , Remi Munos
  • , Alexey Naumov
  • , Pierre Perrault
  • , Michal Valko
  • , Pierre Ménard
  • École Polytechnique
  • National Research University
  • University of Duisburg-Essen
  • DeepMind Technologies Limited
  • Mohamed bin Zayed University of Artificial Intelligence (MBZUAI)
  • IDEMIA
  • Ecole Normale Supérieure de Lyon

Research output: Contribution to conferencePaperpeer-review

Abstract

Incorporating expert demonstrations has empirically helped to improve the sample efficiency of reinforcement learning (RL). This paper quantifies theoretically to what extent this extra information reduces RL's sample complexity. In particular, we study the demonstration-regularized reinforcement learning that leverages the expert demonstrations by KL-regularization for a policy learned by behavior cloning. Our findings reveal that using NE expert demonstrations enables the identification of an optimal policy at a sample complexity of order Õ(Poly(S, A, H)/(ε2NE)) in finite and Õ(Poly(d, H)/(ε2NE)) in linear Markov decision processes, where ε is the target precision, H the horizon, A the number of action, S the number of states in the finite case and d the dimension of the feature space in the linear case. As a by-product, we provide tight convergence guarantees for the behavior cloning procedure under general assumptions on the policy classes. Additionally, we establish that demonstration-regularized methods are provably efficient for reinforcement learning from human feedback (RLHF). In this respect, we provide theoretical evidence showing the benefits of KL-regularization for RLHF in tabular and linear MDPs. Interestingly, we avoid pessimism injection by employing computationally feasible regularization to handle reward estimation uncertainty, thus setting our approach apart from the prior works.

Original languageEnglish
Publication statusPublished - 1 Jan 2024
Externally publishedYes
Event12th International Conference on Learning Representations, ICLR 2024 - Hybrid, Vienna, Austria
Duration: 7 May 202411 May 2024

Conference

Conference12th International Conference on Learning Representations, ICLR 2024
Country/TerritoryAustria
CityHybrid, Vienna
Period7/05/2411/05/24

Fingerprint

Dive into the research topics of 'DEMONSTRATION-REGULARIZED RL'. Together they form a unique fingerprint.

Cite this