TY - GEN
T1 - Mixture weights optimisation for Alpha-Divergence Variational Inference
AU - Daudel, Kamélia
AU - Douc, Randal
N1 - Publisher Copyright:
© 2021 Neural information processing systems foundation. All rights reserved.
PY - 2021/1/1
Y1 - 2021/1/1
N2 - This paper focuses on α-divergence minimisation methods for Variational Inference. We consider the case where the posterior density is approximated by a mixture model and we investigate algorithms optimising the mixture weights of this mixture model by α-divergence minimisation, without any information on the underlying distribution of its mixture components parameters. The Power Descent, defined for all α ≠ 1, is one such algorithm and we establish in our work the full proof of its convergence towards the optimal mixture weights when α < 1. Since the α-divergence recovers the widely-used exclusive Kullback-Leibler when α → 1, we then extend the Power Descent to the case α = 1 and show that we obtain an Entropic Mirror Descent. This leads us to investigate the link between Power Descent and Entropic Mirror Descent: first-order approximations allow us to introduce the Rényi Descent, a novel algorithm for which we prove an O(1/N) convergence rate. Lastly, we compare numerically the behavior of the unbiased Power Descent and of the biased Rényi Descent and we discuss the potential advantages of one algorithm over the other.
AB - This paper focuses on α-divergence minimisation methods for Variational Inference. We consider the case where the posterior density is approximated by a mixture model and we investigate algorithms optimising the mixture weights of this mixture model by α-divergence minimisation, without any information on the underlying distribution of its mixture components parameters. The Power Descent, defined for all α ≠ 1, is one such algorithm and we establish in our work the full proof of its convergence towards the optimal mixture weights when α < 1. Since the α-divergence recovers the widely-used exclusive Kullback-Leibler when α → 1, we then extend the Power Descent to the case α = 1 and show that we obtain an Entropic Mirror Descent. This leads us to investigate the link between Power Descent and Entropic Mirror Descent: first-order approximations allow us to introduce the Rényi Descent, a novel algorithm for which we prove an O(1/N) convergence rate. Lastly, we compare numerically the behavior of the unbiased Power Descent and of the biased Rényi Descent and we discuss the potential advantages of one algorithm over the other.
UR - https://www.scopus.com/pages/publications/85131741700
M3 - Conference contribution
AN - SCOPUS:85131741700
T3 - Advances in Neural Information Processing Systems
SP - 4397
EP - 4408
BT - Advances in Neural Information Processing Systems 34 - 35th Conference on Neural Information Processing Systems, NeurIPS 2021
A2 - Ranzato, Marc'Aurelio
A2 - Beygelzimer, Alina
A2 - Dauphin, Yann
A2 - Liang, Percy S.
A2 - Wortman Vaughan, Jenn
PB - Neural information processing systems foundation
T2 - 35th Conference on Neural Information Processing Systems, NeurIPS 2021
Y2 - 6 December 2021 through 14 December 2021
ER -