TY - JOUR
T1 - Improved Optimistic Algorithms for Logistic Bandits
AU - Faury, Louis
AU - Abeille, Marc
AU - Calauzènes, Clèment
AU - Fercoq, Olivier
N1 - Publisher Copyright:
© 2020 by the author(s).
PY - 2020/1/1
Y1 - 2020/1/1
N2 - The generalized linear bandit framework has at tracted a lot of attention in recent years by extending the well-understood linear setting and allowing to model richer reward structures. It notably covers the logistic model, widely used when rewards are binary. For logistic bandits, the frequentistregret guarantees of existing algorithms are (Formula present), where is a problem dependent constant. Unfortunately, can be arbitrarily large as it scales exponentially with the size of the decision set. This may lead to significantly loose regret bounds and poor empirical performance. In this work, we study the logistic bandit with a focus on the prohibitive dependencies introduced by. We propose a new optimistic algorithm based on a finer examination of the non-linearities of the reward function. We show that it enjoys a (Formula present) regret with no de pendency in, but for a second order term. Our analysis is based on a new tail-inequality for self normalized martingales, of independent interest.
AB - The generalized linear bandit framework has at tracted a lot of attention in recent years by extending the well-understood linear setting and allowing to model richer reward structures. It notably covers the logistic model, widely used when rewards are binary. For logistic bandits, the frequentistregret guarantees of existing algorithms are (Formula present), where is a problem dependent constant. Unfortunately, can be arbitrarily large as it scales exponentially with the size of the decision set. This may lead to significantly loose regret bounds and poor empirical performance. In this work, we study the logistic bandit with a focus on the prohibitive dependencies introduced by. We propose a new optimistic algorithm based on a finer examination of the non-linearities of the reward function. We show that it enjoys a (Formula present) regret with no de pendency in, but for a second order term. Our analysis is based on a new tail-inequality for self normalized martingales, of independent interest.
UR - https://www.scopus.com/pages/publications/105021944564
M3 - Conference article
AN - SCOPUS:105021944564
SN - 2640-3498
VL - 119
JO - Proceedings of Machine Learning Research
JF - Proceedings of Machine Learning Research
T2 - 37th International Conference on Machine Learning, ICML 2020
Y2 - 13 July 2020 through 18 July 2020
ER -