TY - GEN
T1 - CAMEO
T2 - 30th European Signal Processing Conference, EUSIPCO 2022
AU - Chehboune, Mohamed Alami
AU - Kaddah, Rim
AU - Martino, Luca
AU - Llorente, Fernando
AU - Read, Jesse
N1 - Publisher Copyright:
© 2022 European Signal Processing Conference, EUSIPCO. All rights reserved.
PY - 2022/1/1
Y1 - 2022/1/1
N2 - Reinforcement Learning has drawn huge interest as a tool for solving optimal control problems. Solving a given problem (task or environment) involves converging towards an optimal policy. However, there might exist multiple optimal policies that can dramatically differ in their behaviour; for example, some may be faster than the others but at the expense of greater risk. We consider and study a distribution of optimal policies. We design a curiosity-augmented Metropolis algorithm (CAMEO), such that we can sample optimal policies, and such that these policies effectively adopt diverse behaviours, since this implies greater coverage of the different possible optimal policies. In experimental simulations we show that CAMEO indeed obtains policies that all solve classic control problems, and even in the challenging case of environments that provide sparse rewards. We further show that the different policies we sample present different risk profiles, corresponding to interesting practical applications in interpretability, and represents a first step towards learning the distribution of optimal policies itself.
AB - Reinforcement Learning has drawn huge interest as a tool for solving optimal control problems. Solving a given problem (task or environment) involves converging towards an optimal policy. However, there might exist multiple optimal policies that can dramatically differ in their behaviour; for example, some may be faster than the others but at the expense of greater risk. We consider and study a distribution of optimal policies. We design a curiosity-augmented Metropolis algorithm (CAMEO), such that we can sample optimal policies, and such that these policies effectively adopt diverse behaviours, since this implies greater coverage of the different possible optimal policies. In experimental simulations we show that CAMEO indeed obtains policies that all solve classic control problems, and even in the challenging case of environments that provide sparse rewards. We further show that the different policies we sample present different risk profiles, corresponding to interesting practical applications in interpretability, and represents a first step towards learning the distribution of optimal policies itself.
KW - Curiosity model
KW - MCMC
KW - Metropolis
KW - Reinforcement Learning
M3 - Conference contribution
AN - SCOPUS:85141010558
T3 - European Signal Processing Conference
SP - 1482
EP - 1486
BT - 30th European Signal Processing Conference, EUSIPCO 2022 - Proceedings
PB - European Signal Processing Conference, EUSIPCO
Y2 - 29 August 2022 through 2 September 2022
ER -