Passer à la navigation principale Passer à la recherche Passer au contenu principal

Two-target algorithms for infinite-armed bandits with Bernoulli rewards

  • KTH Royal Institute of Technology
  • INRIA Rocquencourt

Résultats de recherche: Contribution à un journalArticle de conférenceRevue par des pairs

Résumé

We consider an infinite-armed bandit problem with Bernoulli rewards. The mean rewards are independent, uniformly distributed over [0; 1]. Rewards 0 and 1 are referred to as a success and a failure, respectively. We propose a novel algorithm where the decision to exploit any arm is based on two successive targets, namely, the total number of successes until the first failure and until the first m failures, respectively, where m is a fixed parameter. This two-target algorithm achieves a long-term average regret in 2√n for a large parameter m and a known time horizon n. This regret is optimal and strictly less than the regret achieved by the best known algorithms, which is in 2√n The results are extended to any mean-reward distribution whose support contains 1 and to unknown time horizons. Numerical experiments show the performance of the algorithm for finite time horizon.

langue originaleAnglais
journalAdvances in Neural Information Processing Systems
étatPublié - 1 janv. 2013
Evénement27th Annual Conference on Neural Information Processing Systems, NIPS 2013 - Lake Tahoe, NV, États-Unis
Durée: 5 déc. 201310 déc. 2013

Empreinte digitale

Examiner les sujets de recherche de « Two-target algorithms for infinite-armed bandits with Bernoulli rewards ». Ensemble, ils forment une empreinte digitale unique.

Contient cette citation