Résumé
We introduce Dynamic Contextual Markov Decision Processes (DCMDPs), a novel reinforcement learning framework for history-dependent environments that generalizes the contextual MDP framework to handle non-Markov environments, where contexts change over time. We consider special cases of the model, with a focus on logistic DCMDPs, which break the exponential dependence on history length by leveraging aggregation functions to determine context transitions. This special structure allows us to derive an upper-confidence-bound style algorithm for which we establish regret bounds. Motivated by our theoretical results, we introduce a practical model-based algorithm for logistic DCMDPs that plans in a latent space and uses optimism over history-dependent features. We demonstrate the efficacy of our approach on a recommendation task (using MovieLens data) where user behavior dynamics evolve in response to recommendations.
| langue originale | Anglais |
|---|---|
| Pages (de - à) | 34011-34053 |
| Nombre de pages | 43 |
| journal | Proceedings of Machine Learning Research |
| Volume | 202 |
| état | Publié - 1 janv. 2023 |
| Modification externe | Oui |
| Evénement | 40th International Conference on Machine Learning, ICML 2023 - Honolulu, États-Unis Durée: 23 juil. 2023 → 29 juil. 2023 |
Empreinte digitale
Examiner les sujets de recherche de « Reinforcement Learning with History-Dependent Dynamic Contexts ». Ensemble, ils forment une empreinte digitale unique.Contient cette citation
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver