Bounded regret in stochastic multi-armed bandits

Research output: Contribution to journalConference articlepeer-review

Abstract

We study the stochastic multi-armed bandit problem when one knows the value μ(*) of an optimal arm, as a well as a positive lower bound on the smallest positive gap Δ. We propose a new randomized policy that attains a regret uniformly bounded over time in this setting. We also prove several lower bounds, which show in particular that bounded regret is not possible if one only knows Δ, and bounded regret of order 1/Δ is not possible if one only knows μ(*).

Original languageEnglish
Pages (from-to)122-134
Number of pages13
JournalJournal of Machine Learning Research
Volume30
Publication statusPublished - 1 Jan 2013
Externally publishedYes
Event26th Conference on Learning Theory, COLT 2013 - Princeton, NJ, United States
Duration: 12 Jun 201314 Jun 2013

Fingerprint

Dive into the research topics of 'Bounded regret in stochastic multi-armed bandits'. Together they form a unique fingerprint.

Cite this