Passer à la navigation principale Passer à la recherche Passer au contenu principal

Strategic Multi-Armed Bandit Problems Under Debt-Free Reporting

Résultats de recherche: Contribution à un journalArticle de conférenceRevue par des pairs

Résumé

We consider the classical multi-armed bandit problem, but with strategic arms. In this context, each arm is characterized by a bounded support reward distribution and strategically aims to maximize its own utility by potentially retaining a portion of its reward, and disclosing only a fraction of it to the learning agent. This scenario unfolds as a game over T rounds, leading to a competition of objectives between the learning agent, aiming to minimize their regret, and the arms, motivated by the desire to maximize their individual utilities. To address these dynamics, we introduce a new mechanism that establishes an equilibrium wherein each arm behaves truthfully and discloses as much of its rewards as possible. With this mechanism, the agent can attain the second-highest average (true) reward among arms, with a cumulative regret bounded by O(log(T)/∆) (problem-dependent) or O(pT log(T)) (worst-case).

langue originaleAnglais
journalAdvances in Neural Information Processing Systems
Volume37
étatPublié - 1 janv. 2024
Evénement38th Conference on Neural Information Processing Systems, NeurIPS 2024 - Vancouver, Canada
Durée: 9 déc. 202415 déc. 2024

Empreinte digitale

Examiner les sujets de recherche de « Strategic Multi-Armed Bandit Problems Under Debt-Free Reporting ». Ensemble, ils forment une empreinte digitale unique.

Contient cette citation