Passer à la navigation principale Passer à la recherche Passer au contenu principal

Cosine Similarity Based Adaptive Implicit Q-Learning for Offline Reinforcement Learning

Résultats de recherche: Le chapitre dans un livre, un rapport, une anthologie ou une collectionContribution à une conférenceRevue par des pairs

Résumé

Offline Reinforcement Learning (RL) methods constrain the policy to align with the behaviour policy, mitigating extrapolation errors caused by out-of-distribution (OOD) actions. Implicit Q-Learning (IQL), a popular offline RL algorithm, leverages expectile regression and introduces an in-sample learning paradigm that enhances the policy evaluation stage without querying OOD actions. However, the crucial parameter T for expectile regression in IQL is fixed, limiting both its performance and flexibility across diverse datasets. In this paper, we propose Cos-IQL, an improved IQL approach based on cosine similarity, which optimizes the policy evaluation function by measuring the cosine similarity between the policy and the behaviour policy. Cos-IQL is essentially a multi-step offline RL algorithm but retains the advantages of in-sample learning, thus avoiding the risks of OOD actions. In addition, Cos-IQL can adaptively adjust parameter T without elaborate fine-tuning. We evaluate Cos-IQL on D4RL benchmark datasets and compare its performance against recent competitive offline RL algorithms. Experimental results show that Cos-IQL achieves state-of-the-art performance.

langue originaleAnglais
titre2025 IEEE Wireless Communications and Networking Conference, WCNC 2025
EditeurInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronique)9798350368369
Les DOIs
étatPublié - 1 janv. 2025
Evénement2025 IEEE Wireless Communications and Networking Conference, WCNC 2025 - Milan, Italie
Durée: 24 mars 202527 mars 2025

Série de publications

NomIEEE Wireless Communications and Networking Conference, WCNC
ISSN (imprimé)1525-3511

Une conférence

Une conférence2025 IEEE Wireless Communications and Networking Conference, WCNC 2025
Pays/TerritoireItalie
La villeMilan
période24/03/2527/03/25

Empreinte digitale

Examiner les sujets de recherche de « Cosine Similarity Based Adaptive Implicit Q-Learning for Offline Reinforcement Learning ». Ensemble, ils forment une empreinte digitale unique.

Contient cette citation