Gap-free bounds for stochastic multi-armed bandit

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

We consider the stochastic multi-armed bandit problem with unknown horizon. We present a randomized decision strategy which is based on updating a probability distribution through a stochastic mirror descent/exponentiated gradient type algorithm. We consider separately two assumptions: nonnegative losses or arbitrary losses with an exponential moment condition. We prove optimal (up to logarithmic factors) gap-free bounds on the excess risk of the average over time of the instantaneous losses induced by the choice of a specific action.

Original languageEnglish
Title of host publicationProceedings of the 17th World Congress, International Federation of Automatic Control, IFAC
Edition1 PART 1
DOIs
Publication statusPublished - 1 Dec 2008
Event17th World Congress, International Federation of Automatic Control, IFAC - Seoul, Korea, Republic of
Duration: 6 Jul 200811 Jul 2008

Publication series

NameIFAC Proceedings Volumes (IFAC-PapersOnline)
Number1 PART 1
Volume17
ISSN (Print)1474-6670

Conference

Conference17th World Congress, International Federation of Automatic Control, IFAC
Country/TerritoryKorea, Republic of
CitySeoul
Period6/07/0811/07/08

Keywords

  • Learning theory
  • Randomized methods
  • Stochastic control

Fingerprint

Dive into the research topics of 'Gap-free bounds for stochastic multi-armed bandit'. Together they form a unique fingerprint.

Cite this