TY - JOUR
T1 - Demonstration Guided Actor-Critic Deep Reinforcement Learning for Fast Teaching of Robots in Dynamic Environments
AU - Gong, Liang
AU - Sun, Te
AU - Li, Xudong
AU - Lin, Ke
AU - Diáz-Rodríguez, Natalia
AU - Filliat, David
AU - Zhang, Zhengfeng
AU - Zhang, Junping
N1 - Publisher Copyright:
©2020 The Authors.This is an open access article under the CC BY-NC-ND license.
PY - 2020/1/1
Y1 - 2020/1/1
N2 - Using direct reinforcement learning (RL) to accomplish a task can be very inefficient, especially in robotic configurations where interactions with the environment are lengthy and costly. Instead, learning from expert demonstration (LfD) is an alternative approach to gain better performance in an RL setting, which also greatly improves sample efficiency. We propose a novel demonstration learning framework for actor-critic based algorithms. Firstly, we put forward an environment pre-training paradigm to initialize the model parameters without interacting with the target environment, which effectively avoids the cold start problem in deep RL scenarios. Secondly, we design a general-purpose LfD framework for most of the mainstream actor-critic RL algorithms that include a policy network and a value function like PPO, SAC, TRPO, A3C. Thirdly, we build a dedicated model training platform to perform the human-robot interaction and numerical experimentation. We evaluate the method in six Mujoco simulated locomotion environments and our robot control simulation platform. Results show that several epochs of pre-training can improve the agent's performance over the early stage of training. Also, the final converged performance of the RL algorithm is also boosted by external demonstration. In general the sample efficiency is improved by 30% with the proposed method. Our demonstration pipeline makes full use of the exploration property of the RL algorithm, which is feasible for fast teaching robots in dynamic environments.
AB - Using direct reinforcement learning (RL) to accomplish a task can be very inefficient, especially in robotic configurations where interactions with the environment are lengthy and costly. Instead, learning from expert demonstration (LfD) is an alternative approach to gain better performance in an RL setting, which also greatly improves sample efficiency. We propose a novel demonstration learning framework for actor-critic based algorithms. Firstly, we put forward an environment pre-training paradigm to initialize the model parameters without interacting with the target environment, which effectively avoids the cold start problem in deep RL scenarios. Secondly, we design a general-purpose LfD framework for most of the mainstream actor-critic RL algorithms that include a policy network and a value function like PPO, SAC, TRPO, A3C. Thirdly, we build a dedicated model training platform to perform the human-robot interaction and numerical experimentation. We evaluate the method in six Mujoco simulated locomotion environments and our robot control simulation platform. Results show that several epochs of pre-training can improve the agent's performance over the early stage of training. Also, the final converged performance of the RL algorithm is also boosted by external demonstration. In general the sample efficiency is improved by 30% with the proposed method. Our demonstration pipeline makes full use of the exploration property of the RL algorithm, which is feasible for fast teaching robots in dynamic environments.
KW - actor-critic framework
KW - deep learning
KW - deep reinforcement learning
KW - learning from demonstration (LfD)
KW - robotics
U2 - 10.1016/j.ifacol.2021.04.227
DO - 10.1016/j.ifacol.2021.04.227
M3 - Conference article
AN - SCOPUS:85107866542
SN - 2405-8963
VL - 53
SP - 271
EP - 278
JO - IFAC-PapersOnLine
JF - IFAC-PapersOnLine
IS - 5
T2 - 3rd IFAC Workshop on Cyber-Physical and Human Systems, CPHS 2020
Y2 - 3 December 2020 through 5 December 2020
ER -