Passer à la navigation principale Passer à la recherche Passer au contenu principal

Optimal Thompson Sampling strategies for support-aware CVaR bandits

  • Dorian Baudry
  • , Romain Gautron
  • , Emilie Kaufmann
  • , Odalric Ambrym Maillard

Résultats de recherche: Le chapitre dans un livre, un rapport, une anthologie ou une collectionContribution à une conférenceRevue par des pairs

Résumé

In this paper we study a multi-arm bandit problem in which the quality of each arm is measured by the Conditional Value at Risk (CVaR) at some level α of the reward distribution. While existing works in this setting mainly focus on Upper Confidence Bound algorithms, we introduce a new Thompson Sampling approach for CVaR bandits on bounded rewards that is flexible enough to solve a variety of problems grounded on physical resources. Building on a recent work by Riou and Honda (2020), we introduce B-CVTS for continuous bounded rewards and M-CVTS for multinomial distributions. On the theoretical side, we provide a non-trivial extension of their analysis that enables to theoretically bound their CVaR regret minimization performance. Strikingly, our results show that these strategies are the first to provably achieve asymptotic optimality in CVaR bandits, matching the corresponding asymptotic lower bounds for this setting. Further, we illustrate empirically the benefit of Thompson Sampling approaches both in a realistic environment simulating a use-case in agriculture and on various synthetic examples.

langue originaleAnglais
titreProceedings of the 38th International Conference on Machine Learning, ICML 2021
EditeurML Research Press
Pages716-726
Nombre de pages11
ISBN (Electronique)9781713845065
étatPublié - 1 janv. 2021
Modification externeOui
Evénement38th International Conference on Machine Learning, ICML 2021 - Virtual, Online
Durée: 18 juil. 202124 juil. 2021

Série de publications

NomProceedings of Machine Learning Research
Volume139
ISSN (Electronique)2640-3498

Une conférence

Une conférence38th International Conference on Machine Learning, ICML 2021
La villeVirtual, Online
période18/07/2124/07/21

Empreinte digitale

Examiner les sujets de recherche de « Optimal Thompson Sampling strategies for support-aware CVaR bandits ». Ensemble, ils forment une empreinte digitale unique.

Contient cette citation