TY - JOUR
T1 - Attention ensemble mixture
T2 - a novel offline reinforcement learning algorithm for autonomous vehicles
AU - Han, Xinchen
AU - Afifi, Hossam
AU - Moungla, Hassine
AU - Marot, Michel
N1 - Publisher Copyright:
© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2025.
PY - 2025/4/1
Y1 - 2025/4/1
N2 - Offline Reinforcement Learning (RL), which optimizes policies from previously collected datasets, is a promising approach for tackling tasks where direct interaction with the environment is infeasible due to high risk or cost of errors, such as autonomous vehicle (AV) applications. However, offline RL faces a critical challenge: extrapolation errors arising from out-of-distribution (OOD) data. In this paper, we propose Attention Ensemble Mixture (AEM), a novel offline RL algorithm that leverages ensemble learning and an attention mechanism. Ensemble learning enhances the confidence of Q-function predictions, while the attention mechanism evaluates the uncertainty of selected actions. By assigning appropriate attention weights to each Q-head, AEM effectively down-weights OOD actions and up-weights in-distribution actions. We further introduce three key improvements to enhance the robustness and generality of AEM: attention-weighted Bellman backups, KL divergence regularization, and delayed attention updates. Extensive comparative experiments demonstrate that AEM outperforms several state-of-the-art ensemble offline RL algorithms, while ablation studies underscore the significance of the proposed enhancements. In AV tasks, AEM exhibits superior performance compared to other methods, excelling in both offline and online evaluations.
AB - Offline Reinforcement Learning (RL), which optimizes policies from previously collected datasets, is a promising approach for tackling tasks where direct interaction with the environment is infeasible due to high risk or cost of errors, such as autonomous vehicle (AV) applications. However, offline RL faces a critical challenge: extrapolation errors arising from out-of-distribution (OOD) data. In this paper, we propose Attention Ensemble Mixture (AEM), a novel offline RL algorithm that leverages ensemble learning and an attention mechanism. Ensemble learning enhances the confidence of Q-function predictions, while the attention mechanism evaluates the uncertainty of selected actions. By assigning appropriate attention weights to each Q-head, AEM effectively down-weights OOD actions and up-weights in-distribution actions. We further introduce three key improvements to enhance the robustness and generality of AEM: attention-weighted Bellman backups, KL divergence regularization, and delayed attention updates. Extensive comparative experiments demonstrate that AEM outperforms several state-of-the-art ensemble offline RL algorithms, while ablation studies underscore the significance of the proposed enhancements. In AV tasks, AEM exhibits superior performance compared to other methods, excelling in both offline and online evaluations.
KW - Attention
KW - Autonomous vehicle
KW - Deep q-learning
KW - Ensemble learning
KW - Offline RL
UR - https://www.scopus.com/pages/publications/86000313615
U2 - 10.1007/s10489-025-06403-7
DO - 10.1007/s10489-025-06403-7
M3 - Article
AN - SCOPUS:86000313615
SN - 0924-669X
VL - 55
JO - Applied Intelligence
JF - Applied Intelligence
IS - 6
M1 - 508
ER -