TY - JOUR
T1 - Complexity Bounds for Deterministic Partially Observed Markov Decision Processes
AU - Vessaire, Cyrille
AU - Carpentier, Pierre
AU - Chancelier, Jean Philippe
AU - De Lara, Michel
AU - Rodríguez-Martínez, Alejandro
N1 - Publisher Copyright:
© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024.
PY - 2025/1/1
Y1 - 2025/1/1
N2 - Partially Observed Markov Decision Processes (Pomdp) share the structure of Markov Decision Processs (Mdp) — with stages, states, actions, probability transitions, rewards — but for the notion of solutions. In a Pomdp, observation mappings provide partial and/or imperfect knowledge of the state, and a policy maps observations (and not states like in a Mdp) towards actions. Theroretically, a Pomdp can be solved by Dynamic Programming (DP), but with an information state made of probability distributions over the original state, hence DP suffers from the curse of dimensionality, even in the finite case. This is why, authors like (Littman, M. L. 1996). Algorithms for Sequential Decision Making. PhD thesis, Brown University) and (Bonet, B. 2009). Deterministic POMDPs revisited. In Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, UAI ’09 (pp. 59-66). Arlington, Virginia, USA. AUAI Press) have studied the subclass of so-called Deterministic Partially Observed Markov Decision Processes (Det-Pomdp), where transitions and observations mappings are deterministic. In this paper, we improve on Littman’s complexity bounds. We then introduce and study a more restricted class, Separated Det-Pomdps, and give some new complexity bounds for this class.
AB - Partially Observed Markov Decision Processes (Pomdp) share the structure of Markov Decision Processs (Mdp) — with stages, states, actions, probability transitions, rewards — but for the notion of solutions. In a Pomdp, observation mappings provide partial and/or imperfect knowledge of the state, and a policy maps observations (and not states like in a Mdp) towards actions. Theroretically, a Pomdp can be solved by Dynamic Programming (DP), but with an information state made of probability distributions over the original state, hence DP suffers from the curse of dimensionality, even in the finite case. This is why, authors like (Littman, M. L. 1996). Algorithms for Sequential Decision Making. PhD thesis, Brown University) and (Bonet, B. 2009). Deterministic POMDPs revisited. In Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, UAI ’09 (pp. 59-66). Arlington, Virginia, USA. AUAI Press) have studied the subclass of so-called Deterministic Partially Observed Markov Decision Processes (Det-Pomdp), where transitions and observations mappings are deterministic. In this paper, we improve on Littman’s complexity bounds. We then introduce and study a more restricted class, Separated Det-Pomdps, and give some new complexity bounds for this class.
UR - https://www.scopus.com/pages/publications/85207862212
U2 - 10.1007/s10479-024-06282-0
DO - 10.1007/s10479-024-06282-0
M3 - Article
AN - SCOPUS:85207862212
SN - 0254-5330
VL - 344
SP - 345
EP - 382
JO - Annals of Operations Research
JF - Annals of Operations Research
IS - 1
ER -