Regret bounds for Narendra-Shapiro bandit algorithms

Sébastien Gadat, Fabien Panloup, Sofiane Saadane

Research output: Contribution to journalArticlepeer-review

Abstract

Narendra-Shapiro (NS) algorithms are bandit-type algorithms developed in the 1960s. NS-algorithms have been deeply studied in infinite horizon but little non-asymptotic results exist for this type of bandit algorithms. In this paper, we focus on a non-asymptotic study of the regret and address the following question: are Narendra-Shapiro bandit algorithms competitive from this point of view? In our main result, we obtain some uniform explicit bounds for the regret of (over)-penalized-NS algorithms. We also extend to the multi-armed case some convergence properties of penalized-NS algorithms towards a stationary Piecewise Deterministic Markov Process (PDMP). Finally, we establish some new sharp mixing bounds for these processes.

Original languageEnglish
Pages (from-to)886-926
Number of pages41
JournalStochastics
Volume90
Issue number6
DOIs
Publication statusPublished - 18 Aug 2018
Externally publishedYes

Keywords

  • Regret
  • piecewise deterministic Markov processes
  • stochastic bandit algorithms

Fingerprint

Dive into the research topics of 'Regret bounds for Narendra-Shapiro bandit algorithms'. Together they form a unique fingerprint.

Cite this