Résumé
Motivated by practical applications, chiefly clinical trials, we study the regret achievable for stochastic bandits under the constraint that the employed policy must split trials into a small number of batches. We propose a simple policy, and show that a very small number of batches gives close to minimax optimal regret bounds. As a byproduct, we derive optimal policies with low switching cost for stochastic bandits.
| langue originale | Anglais |
|---|---|
| Pages (de - à) | 660-681 |
| Nombre de pages | 22 |
| journal | Annals of Statistics |
| Volume | 44 |
| Numéro de publication | 2 |
| Les DOIs | |
| état | Publié - 1 avr. 2016 |
| Modification externe | Oui |
Empreinte digitale
Examiner les sujets de recherche de « Batched bandit problems ». Ensemble, ils forment une empreinte digitale unique.Contient cette citation
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver