Passer à la navigation principale Passer à la recherche Passer au contenu principal

Profitable Bandits

  • CNRS LTCI
  • CNRS UMR 5669, 'Unité de Mathématiques Pures et Appliquées' and project-team Inria NUMED, Ecole Normale Supérieure de Lyon

Résultats de recherche: Contribution à un journalArticle de conférenceRevue par des pairs

Résumé

Originally motivated by default risk management applications, this paper investigates a novel problem, referred to as the profitable bandit problem here. At each step, an agent chooses a subset of the K ≥ 1 possible actions. For each action chosen, she then respectively pays and receives the sum of a random number of costs and rewards. Her objective is to maximize her cumulated profit. We adapt and study three well-known strategies in this purpose, that were proved to be most efficient in other settings: kl-UCB, Bayes-UCB and Thompson Sampling. For each of them, we prove a finite time regret bound which, together with a lower bound we obtain as well, establishes asymptotic optimality in some cases. Our goal is also to compare these three strategies from a theoretical and empirical perspective both at the same time. We give simple, self-contained proofs that emphasize their similarities, as well as their differences. While both Bayesian strategies are automatically adapted to the geometry of information, the numerical experiments carried out show a slight advantage for Thompson Sampling in practice.

langue originaleAnglais
Pages (de - à)694-709
Nombre de pages16
journalProceedings of Machine Learning Research
Volume95
étatPublié - 1 janv. 2018
Modification externeOui
Evénement10th Asian Conference on Machine Learning, ACML 2018 - Beijing, Chine
Durée: 14 nov. 201816 nov. 2018

Empreinte digitale

Examiner les sujets de recherche de « Profitable Bandits ». Ensemble, ils forment une empreinte digitale unique.

Contient cette citation