Résumé
In this paper we propose the first multi-armed bandit algorithm based on re-sampling that achieves asymptotically optimal regret simultaneously for different families of arms (namely Bernoulli, Gaussian and Poisson distributions). Unlike Thompson Sampling which requires to specify a different prior to be optimal in each case, our proposal RB-SDA does not need any distribution-dependent tuning. RB-SDA belongs to the family of Sub-sampling Duelling Algorithms (SDA) which combines the sub-sampling idea first used by the BESA [1] and SSMC [2] algorithms with different sub-sampling schemes. In particular, RB-SDA uses Random Block sampling. We perform an experimental study assessing the flexibility and robustness of this promising novel approach for exploration in bandit models.
| langue originale | Anglais |
|---|---|
| journal | Advances in Neural Information Processing Systems |
| Volume | 2020-December |
| état | Publié - 1 janv. 2020 |
| Modification externe | Oui |
| Evénement | 34th Conference on Neural Information Processing Systems, NeurIPS 2020 - Virtual, Online Durée: 6 déc. 2020 → 12 déc. 2020 |
Empreinte digitale
Examiner les sujets de recherche de « Sub-sampling for efficient non-parametric bandit exploration ». Ensemble, ils forment une empreinte digitale unique.Contient cette citation
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver