Skip to main navigation Skip to search Skip to main content

Cosine Similarity Based Adaptive Implicit Q-Learning for Offline Reinforcement Learning

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Offline Reinforcement Learning (RL) methods constrain the policy to align with the behaviour policy, mitigating extrapolation errors caused by out-of-distribution (OOD) actions. Implicit Q-Learning (IQL), a popular offline RL algorithm, leverages expectile regression and introduces an in-sample learning paradigm that enhances the policy evaluation stage without querying OOD actions. However, the crucial parameter T for expectile regression in IQL is fixed, limiting both its performance and flexibility across diverse datasets. In this paper, we propose Cos-IQL, an improved IQL approach based on cosine similarity, which optimizes the policy evaluation function by measuring the cosine similarity between the policy and the behaviour policy. Cos-IQL is essentially a multi-step offline RL algorithm but retains the advantages of in-sample learning, thus avoiding the risks of OOD actions. In addition, Cos-IQL can adaptively adjust parameter T without elaborate fine-tuning. We evaluate Cos-IQL on D4RL benchmark datasets and compare its performance against recent competitive offline RL algorithms. Experimental results show that Cos-IQL achieves state-of-the-art performance.

Original languageEnglish
Title of host publication2025 IEEE Wireless Communications and Networking Conference, WCNC 2025
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9798350368369
DOIs
Publication statusPublished - 1 Jan 2025
Event2025 IEEE Wireless Communications and Networking Conference, WCNC 2025 - Milan, Italy
Duration: 24 Mar 202527 Mar 2025

Publication series

NameIEEE Wireless Communications and Networking Conference, WCNC
ISSN (Print)1525-3511

Conference

Conference2025 IEEE Wireless Communications and Networking Conference, WCNC 2025
Country/TerritoryItaly
CityMilan
Period24/03/2527/03/25

Keywords

  • Cosine Similarity
  • Implicit Q-Learning
  • In-sample Learning
  • Offline RL

Fingerprint

Dive into the research topics of 'Cosine Similarity Based Adaptive Implicit Q-Learning for Offline Reinforcement Learning'. Together they form a unique fingerprint.

Cite this