Regularized contextual bandits

Research output: Contribution to conferencePaperpeer-review

Abstract

We consider the stochastic contextual bandit problem with additional regularization. The motivation comes from problems where the policy of the agent must be close to some baseline policy known to perform well on the task. To tackle this problem we use a nonparametric model and propose an algorithm splitting the context space into bins, solving simultaneously - and independently - regularized multi-armed bandit instances on each bin. We derive slow and fast rates of convergence, depending on the unknown complexity of the problem. We also consider a new relevant margin condition to get problem-independent convergence rates, yielding intermediate rates interpolating between the aforementioned slow and fast rates.

Original languageEnglish
Publication statusPublished - 1 Jan 2019
Externally publishedYes
Event22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019 - Naha, Japan
Duration: 16 Apr 201918 Apr 2019

Conference

Conference22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019
Country/TerritoryJapan
CityNaha
Period16/04/1918/04/19

Fingerprint

Dive into the research topics of 'Regularized contextual bandits'. Together they form a unique fingerprint.

Cite this