Passer à la navigation principale Passer à la recherche Passer au contenu principal

Reinforcement Learning with History-Dependent Dynamic Contexts

  • Guy Tennenholtz
  • , Nadav Merlis
  • , Lior Shani
  • , Martin Mladenov
  • , Craig Boutilier
  • Google Inc.
  • ENSAE

Résultats de recherche: Contribution à un journalArticle de conférenceRevue par des pairs

Résumé

We introduce Dynamic Contextual Markov Decision Processes (DCMDPs), a novel reinforcement learning framework for history-dependent environments that generalizes the contextual MDP framework to handle non-Markov environments, where contexts change over time. We consider special cases of the model, with a focus on logistic DCMDPs, which break the exponential dependence on history length by leveraging aggregation functions to determine context transitions. This special structure allows us to derive an upper-confidence-bound style algorithm for which we establish regret bounds. Motivated by our theoretical results, we introduce a practical model-based algorithm for logistic DCMDPs that plans in a latent space and uses optimism over history-dependent features. We demonstrate the efficacy of our approach on a recommendation task (using MovieLens data) where user behavior dynamics evolve in response to recommendations.

langue originaleAnglais
Pages (de - à)34011-34053
Nombre de pages43
journalProceedings of Machine Learning Research
Volume202
étatPublié - 1 janv. 2023
Modification externeOui
Evénement40th International Conference on Machine Learning, ICML 2023 - Honolulu, États-Unis
Durée: 23 juil. 202329 juil. 2023

Empreinte digitale

Examiner les sujets de recherche de « Reinforcement Learning with History-Dependent Dynamic Contexts ». Ensemble, ils forment une empreinte digitale unique.

Contient cette citation