Résumé
Actor-critic methods integrating target networks have exhibited a stupendous empirical success in deep reinforcement learning. However, a theoretical understanding of the use of target networks in actor-critic methods is largely missing in the literature. In this paper, we reduce this gap between theory and practice by proposing the first theoretical analysis of an online target-based actor-critic algorithm with linear function approximation in the discounted reward setting. Our algorithm uses three different timescales: one for the actor and two for the critic. Instead of using the standard single timescale temporal difference (TD) learning algorithm as a critic, we use a two timescales target-based version of TD learning closely inspired from practical actor-critic algorithms implementing target networks. First, we establish asymptotic convergence results for both the critic and the actor under Markovian sampling. Then, we provide a finite-time analysis showing the impact of incorporating a target network into actor-critic methods.
| langue originale | Anglais |
|---|---|
| Pages (de - à) | 991-1040 |
| Nombre de pages | 50 |
| journal | Proceedings of Machine Learning Research |
| Volume | 151 |
| état | Publié - 1 janv. 2022 |
| Evénement | 25th International Conference on Artificial Intelligence and Statistics, AISTATS 2022 - Virtual, Online, Espagne Durée: 28 mars 2022 → 30 mars 2022 |
Empreinte digitale
Examiner les sujets de recherche de « Analysis of a Target-Based Actor-Critic Algorithm with Linear Function Approximation ». Ensemble, ils forment une empreinte digitale unique.Contient cette citation
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver