Latent Skill Planning for Exploration and Transfer
Authors: Kevin Xie, Homanga Bharadhwaj, Danijar Hafner, Animesh Garg, Florian Shkurti
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform experimental evaluation over locomotion tasks based on the Deep Mind Control Suite framework (Tassa et al., 2018) to understand the following questions: Does LSP learn useful skills and compose them appropriately to succeed in individual tasks? Does LSP adapt to a target task with different environment reward functions quickly, after being pre-trained on another task? |
| Researcher Affiliation | Collaboration | Kevin Xie1, , Homanga Bharadhwaj1 , Danijar Hafner1,2 Animesh Garg1,3, Florian Shkurti1 1University of Toronto and Vector Institute, 2Google Brain, 3Nvidia |
| Pseudocode | Yes | Algorithm 1: Learning Skills for Planning |
| Open Source Code | No | Videos are available at: https://sites.google.com/view/latent-skill-planning/ and Video visualizations are in the website https://sites.google.com/view/partial-amortization-hierarchy/home. (These links are for videos/visualizations, not source code. No explicit statement of source code release for their method.) |
| Open Datasets | Yes | We perform experimental evaluation over locomotion tasks based on the Deep Mind Control Suite framework (Tassa et al., 2018) |
| Dataset Splits | No | The paper mentions 'training' and 'test' scenarios but does not explicitly provide specific train/validation/test dataset splits or percentages. |
| Hardware Specification | No | We thank Vector Institute Toronto for compute support. (No specific hardware models or detailed specifications are provided.) |
| Software Dependencies | No | Our method is based on the tensorflow2 implementation of Dreamer (Hafner et al., 2019) (No specific version number for TensorFlow 2 or other software.) |
| Experiment Setup | Yes | For LSP, skill vectors are 3-dimensional and are held for K = 10 steps before being updated. The CEM method has a planning horizon of H = 10, goes through Max CEMiter = 4 iterations, proposes G = 16 skills and uses the top M = 4 proposals to recompute statistics in each iteration. The additional noise ϵ added to the CEM optimized distribution is Normal(0, 0.1). |