Passer à la navigation principale Passer à la recherche Passer au contenu principal

PAC-Bayesian Offline Contextual Bandits With Guarantees

  • ENSAE & Criteo AI Lab.
  • ENSAE
  • ESSEC Business School

Résultats de recherche: Contribution à un journalArticle de conférenceRevue par des pairs

Résumé

This paper introduces a new principled approach for off-policy learning in contextual bandits. Unlike previous work, our approach does not derive learning principles from intractable or loose bounds. We analyse the problem through the PAC-Bayesian lens, interpreting policies as mixtures of decision rules. This allows us to propose novel generalization bounds and provide tractable algorithms to optimize them. We prove that the derived bounds are tighter than their competitors, and can be optimized directly to confidently improve upon the logging policy offline. Our approach learns policies with guarantees, uses all available data and does not require tuning additional hyperparameters on held-out sets. We demonstrate through extensive experiments the effectiveness of our approach in providing performance guarantees in practical scenarios.

langue originaleAnglais
Pages (de - à)29777-29799
Nombre de pages23
journalProceedings of Machine Learning Research
Volume202
étatPublié - 1 janv. 2023
Modification externeOui
Evénement40th International Conference on Machine Learning, ICML 2023 - Honolulu, États-Unis
Durée: 23 juil. 202329 juil. 2023

Empreinte digitale

Examiner les sujets de recherche de « PAC-Bayesian Offline Contextual Bandits With Guarantees ». Ensemble, ils forment une empreinte digitale unique.

Contient cette citation