Solving Ergodic Markov Decision Processes and Perfect Information Zero-sum Stochastic Games by Variance Reduced Deflated Value Iteration

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Recently, Sidford, Wang, Wu and Ye (2018) developed an algorithm combining variance reduction techniques with value iteration to solve discounted Markov decision processes. This algorithm has a sublinear complexity when the discount factor is fixed. Here, we extend this approach to mean-payoff problems, including both Markov decision processes and perfect information zero-sum stochastic games. We obtain sublinear complexity bounds, assuming there is a distinguished state which is accessible from all initial states and for all policies. Our method is based on a reduction from the mean payoff problem to the discounted problem by a Doob h-transform, combined with a deflation technique. The complexity analysis of this algorithm uses at the same time the techniques developed by Sidford et al. in the discounted case and non-linear spectral theory techniques (Collatz-Wielandt characterization of the eigenvalue).

Original languageEnglish
Title of host publication2019 IEEE 58th Conference on Decision and Control, CDC 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages5963-5970
Number of pages8
ISBN (Electronic)9781728113982
DOIs
Publication statusPublished - 1 Dec 2019
Event58th IEEE Conference on Decision and Control, CDC 2019 - Nice, France
Duration: 11 Dec 201913 Dec 2019

Publication series

NameProceedings of the IEEE Conference on Decision and Control
Volume2019-December
ISSN (Print)0743-1546
ISSN (Electronic)2576-2370

Conference

Conference58th IEEE Conference on Decision and Control, CDC 2019
Country/TerritoryFrance
CityNice
Period11/12/1913/12/19

Fingerprint

Dive into the research topics of 'Solving Ergodic Markov Decision Processes and Perfect Information Zero-sum Stochastic Games by Variance Reduced Deflated Value Iteration'. Together they form a unique fingerprint.

Cite this