TY - GEN
T1 - Solving Ergodic Markov Decision Processes and Perfect Information Zero-sum Stochastic Games by Variance Reduced Deflated Value Iteration
AU - Akian, Marianne
AU - Gaubert, Stephane
AU - Qu, Zheng
AU - Saadi, Omar
N1 - Publisher Copyright:
© 2019 IEEE.
PY - 2019/12/1
Y1 - 2019/12/1
N2 - Recently, Sidford, Wang, Wu and Ye (2018) developed an algorithm combining variance reduction techniques with value iteration to solve discounted Markov decision processes. This algorithm has a sublinear complexity when the discount factor is fixed. Here, we extend this approach to mean-payoff problems, including both Markov decision processes and perfect information zero-sum stochastic games. We obtain sublinear complexity bounds, assuming there is a distinguished state which is accessible from all initial states and for all policies. Our method is based on a reduction from the mean payoff problem to the discounted problem by a Doob h-transform, combined with a deflation technique. The complexity analysis of this algorithm uses at the same time the techniques developed by Sidford et al. in the discounted case and non-linear spectral theory techniques (Collatz-Wielandt characterization of the eigenvalue).
AB - Recently, Sidford, Wang, Wu and Ye (2018) developed an algorithm combining variance reduction techniques with value iteration to solve discounted Markov decision processes. This algorithm has a sublinear complexity when the discount factor is fixed. Here, we extend this approach to mean-payoff problems, including both Markov decision processes and perfect information zero-sum stochastic games. We obtain sublinear complexity bounds, assuming there is a distinguished state which is accessible from all initial states and for all policies. Our method is based on a reduction from the mean payoff problem to the discounted problem by a Doob h-transform, combined with a deflation technique. The complexity analysis of this algorithm uses at the same time the techniques developed by Sidford et al. in the discounted case and non-linear spectral theory techniques (Collatz-Wielandt characterization of the eigenvalue).
UR - https://www.scopus.com/pages/publications/85082500634
U2 - 10.1109/CDC40024.2019.9029885
DO - 10.1109/CDC40024.2019.9029885
M3 - Conference contribution
AN - SCOPUS:85082500634
T3 - Proceedings of the IEEE Conference on Decision and Control
SP - 5963
EP - 5970
BT - 2019 IEEE 58th Conference on Decision and Control, CDC 2019
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 58th IEEE Conference on Decision and Control, CDC 2019
Y2 - 11 December 2019 through 13 December 2019
ER -