Combinatorial Bandits for Sequential Learning in Colonel Blotto Games

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The Colonel Blotto game is a renowned resource allocation problem with a long-standing literature in game theory (almost 100 years). In this work, we propose and study a regret-minimization model where a learner repeatedly plays the Colonel Blotto game against several adversaries. At each stage, the learner distributes her budget of resources on a fixed number of battlefields to maximize the aggregate value of battlefields she wins; each battlefield being won if there is no adversary that has higher allocation. We focus on the bandit feedback setting. We first observe that it can be modeled as a path planning problem. It is then possible to use the classical ComBand algorithm to guarantee a sub-linear regret in terms of time horizon, but this entails two fundamental challenges: (i) the computation is inefficient due to the huge size of the action set, and (ii) the standard exploration distribution leads to a loose guarantee in practice. To address the first, we construct a modified algorithm that can be efficiently implemented by applying a dynamic programming technique called weight pushing; for the second, we propose methods optimizing the exploration distribution to improve the regret bound.

Original languageEnglish
Title of host publication2019 IEEE 58th Conference on Decision and Control, CDC 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages867-872
Number of pages6
ISBN (Electronic)9781728113982
DOIs
Publication statusPublished - 1 Dec 2019
Externally publishedYes
Event58th IEEE Conference on Decision and Control, CDC 2019 - Nice, France
Duration: 11 Dec 201913 Dec 2019

Publication series

NameProceedings of the IEEE Conference on Decision and Control
Volume2019-December
ISSN (Print)0743-1546
ISSN (Electronic)2576-2370

Conference

Conference58th IEEE Conference on Decision and Control, CDC 2019
Country/TerritoryFrance
CityNice
Period11/12/1913/12/19

Fingerprint

Dive into the research topics of 'Combinatorial Bandits for Sequential Learning in Colonel Blotto Games'. Together they form a unique fingerprint.

Cite this