arxiv:2306.08388

Skill-Critic: Refining Learned Skills for Reinforcement Learning

Published on Jun 14, 2023

Authors:

Abstract

Hierarchical reinforcement learning (RL) can accelerate long-horizon decision-making by temporally abstracting a policy into multiple levels. Promising results in sparse reward environments have been seen with skills, i.e. sequences of primitive actions. Typically, a skill latent space and policy are discovered from offline data, but the resulting low-level <PRE_TAG>policy</POST_TAG> can be unreliable due to low-coverage demonstrations or distribution shifts. As a solution, we propose fine-tuning the low-level <PRE_TAG>policy</POST_TAG> in conjunction with high-level skill selection. Our Skill-Critic algorithm optimizes both the low and high-level policies; these policies are also initialized and regularized by the latent space learned from offline demonstrations to guide the joint policy optimization. We validate our approach in multiple sparse RL environments, including a new sparse reward autonomous racing task in Gran Turismo Sport. The experiments show that Skill-Critic's low-level <PRE_TAG>policy</POST_TAG> fine-tuning and demonstration-guided regularization are essential for optimal performance. Images and videos are available at https://sites.google.com/view/skill-critic. We plan to open source the code with the final version.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

No model linking this paper

Cite arxiv.org/abs/2306.08388 in a model README.md to link it from this page.

No dataset linking this paper

Cite arxiv.org/abs/2306.08388 in a dataset README.md to link it from this page.

No Space linking this paper

Cite arxiv.org/abs/2306.08388 in a Space README.md to link it from this page.

No Collection including this paper

Add this paper to a collection to link it from this page.