Passer à la navigation principale Passer à la recherche Passer au contenu principal

Bayesian Off-Policy Evaluation and Learning for Large Action Spaces

  • ENSA
  • ENS PARIS-SACLAY

Résultats de recherche: Contribution à un journalArticle de conférenceRevue par des pairs

Résumé

In interactive systems, actions are often correlated, presenting an opportunity for more sample-efficient off-policy evaluation (OPE) and learning (OPL) in large action spaces. We introduce a unified Bayesian framework to capture these correlations through structured and informative priors. In this framework, we propose sDM, a generic Bayesian approach for OPE and OPL, grounded in both algorithmic and theoretical foundations. Notably, sDM leverages action correlations without compromising computational efficiency. Moreover, inspired by online Bayesian bandits, we introduce Bayesian metrics that assess the average performance of algorithms across multiple problem instances, deviating from the conventional worst-case assessments. We analyze sDM in OPE and OPL, highlighting the benefits of leveraging action correlations. Empirical evidence showcases the strong performance of sDM.

langue originaleAnglais
Pages (de - à)136-144
Nombre de pages9
journalProceedings of Machine Learning Research
Volume258
étatPublié - 1 janv. 2025
Modification externeOui
Evénement28th International Conference on Artificial Intelligence and Statistics, AISTATS 2025 - Mai Khao, Thadlande
Durée: 3 mai 20255 mai 2025

Empreinte digitale

Examiner les sujets de recherche de « Bayesian Off-Policy Evaluation and Learning for Large Action Spaces ». Ensemble, ils forment une empreinte digitale unique.

Contient cette citation