Skip to main navigation Skip to search Skip to main content

Statistically Efficient, Polynomial-Time Algorithms for Combinatorial Semi-Bandits

Research output: Contribution to journalArticlepeer-review

Abstract

We consider combinatorial semi-bandits over a set X g (0,1)d where rewards are uncorrelated across items. For this problem, the algorithm ESCB yields the smallest known regret bound R(T) = O ( d (łn m)2 (łn T) øver "min) after T rounds, where m = maxx g X 1Tx. However, ESCB has computational complexity O(|X|), which is typically exponential in d, and cannot be used in large dimensions. We propose the first algorithm that is both computationally and statistically efficient for this problem with regret R(T) = O (d (łn m)2 (łn T)øver "min) and computational asymptotic complexity O( T-1 poly (d)), where T is a function which vanishes arbitrarily slowly. Our approach involves carefully designing AESCB, an approximate version of ESCB with the same regret guarantees. We show that, whenever budgeted linear maximization over X can be solved up to a given approximation ratio, AESCB is implementable in polynomial time O (T-1 poly (d)) by repeatedly maximizing a linear function over X subject to a linear budget constraint, and showing how to solve these maximization problems efficiently. Additional algorithms, proofs and numerical experiments are given in the complete version of this work.

Original languageEnglish
Pages (from-to)5-6
Number of pages2
JournalPerformance Evaluation Review
Volume49
Issue number1
DOIs
Publication statusPublished - 1 Jun 2021
Externally publishedYes

Keywords

  • bandits
  • combinatorial bandits
  • combinatorial optimization

Fingerprint

Dive into the research topics of 'Statistically Efficient, Polynomial-Time Algorithms for Combinatorial Semi-Bandits'. Together they form a unique fingerprint.

Cite this