Regularized Contextual Bandits

Xavier Fontaine, Quentin Berthet, Vianney Perchet

Research output: Contribution to journalConference articlepeer-review

Abstract

We consider the stochastic contextual bandit problem with additional regularization. The motivation comes from problems where the policy of the agent must be close to some baseline policy known to perform well on the task. To tackle this problem we use a nonparametric model and propose an algorithm splitting the context space into bins, solving simultaneously | and independently | regularized multi-armed bandit instances on each bin. We derive slow and fast rates of convergence, depending on the unknown complexity of the problem. We also consider a new relevant margin condition to get problem-independent convergence rates, yielding intermediate rates interpolating between the aforementioned slow and fast rates.

Original languageEnglish
Pages (from-to)2144-2153
Number of pages10
JournalProceedings of Machine Learning Research
Volume89
Publication statusPublished - 1 Jan 2019
Externally publishedYes
Event22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019 - Naha, Japan
Duration: 16 Apr 201918 Apr 2019

Fingerprint

Dive into the research topics of 'Regularized Contextual Bandits'. Together they form a unique fingerprint.

Cite this