Abstract
We provide a rigorous analysis of training by variational inference (VI) of Bayesian neural networks in the two-layer and infinite-width case. We consider a regression problem with a regularized evidence lower bound (ELBO) which is decomposed into the expected log-likelihood of the data and the Kullback-Leibler (KL) divergence between the a priori distribution and the variational posterior. With an appropriate weighting of the KL, we prove a law of large numbers for three different training schemes: (i) the idealized case with exact estimation of a multiple Gaussian integral from the reparametrization trick, (ii) a minibatch scheme using Monte Carlo sampling, commonly known as Bayes by Backprop, and (iii) a new and computationally cheaper algorithm which we introduce as Minimal VI. An important result is that all methods converge to the same mean-field limit. Finally, we illustrate our results numerically and discuss the need for the derivation of a central limit theorem.
| Original language | English |
|---|---|
| Pages (from-to) | 4657-4695 |
| Number of pages | 39 |
| Journal | Proceedings of Machine Learning Research |
| Volume | 195 |
| Publication status | Published - 1 Jan 2023 |
| Event | 36th Annual Conference on Learning Theory, COLT 2023 - Bangalore, India Duration: 12 Jul 2023 → 15 Jul 2023 |
Keywords
- Bayesian neural networks
- infinite-width neural networks
- law of large numbers
- mean-field
- variational inference