Résumé
We consider the problem where an agent wants to find a hidden object that is randomly located in some vertex of a directed acyclic graph (DAG) according to a fixed but possibly unknown distribution. The agent can only examine vertices whose in-neighbors have already been examined. In this paper, we address a learning setting where we allow the agent to stop before having found the object and restart searching on a new independent instance of the same problem. Our goal is to maximize the total number of hidden objects found given a time budget. The agent can thus skip an instance after realizing that it would spend too much time on it. Our contributions are both to the search theory and multi-armed bandits. If the distribution is known, we provide a quasi-optimal and efficient stationary strategy. If the distribution is unknown, we additionally show how to sequentially approximate it and, at the same time, act near-optimally in order to collect as many hidden objects as possible.
| langue originale | Anglais |
|---|---|
| état | Publié - 1 janv. 2019 |
| Modification externe | Oui |
| Evénement | 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019 - Naha, Japon Durée: 16 avr. 2019 → 18 avr. 2019 |
Une conférence
| Une conférence | 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019 |
|---|---|
| Pays/Territoire | Japon |
| La ville | Naha |
| période | 16/04/19 → 18/04/19 |
Empreinte digitale
Examiner les sujets de recherche de « Finding the bandit in a graph: Sequential search-and-stop ». Ensemble, ils forment une empreinte digitale unique.Contient cette citation
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver