Skill-Critic: Refining Learned Skills for Reinforcement Learning
Abstract
Hierarchical reinforcement learning (RL) can accelerate long-horizon decision-making by temporally abstracting a policy into multiple levels. Promising results in sparse reward environments have been seen with skills, i.e. sequences of primitive actions. Typically, a skill latent space and policy are discovered from offline data, but the resulting low-level <PRE_TAG>policy</POST_TAG> can be unreliable due to low-coverage demonstrations or distribution shifts. As a solution, we propose fine-tuning the low-level <PRE_TAG>policy</POST_TAG> in conjunction with high-level skill selection. Our Skill-Critic algorithm optimizes both the low and high-level policies; these policies are also initialized and regularized by the latent space learned from offline demonstrations to guide the joint policy optimization. We validate our approach in multiple sparse RL environments, including a new sparse reward autonomous racing task in Gran Turismo Sport. The experiments show that Skill-Critic's low-level <PRE_TAG>policy</POST_TAG> fine-tuning and demonstration-guided regularization are essential for optimal performance. Images and videos are available at https://sites.google.com/view/skill-critic. We plan to open source the code with the final version.
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper