Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Latent Skill Planning for Exploration and Transfer
Authors: Kevin Xie, Homanga Bharadhwaj, Danijar Hafner, Animesh Garg, Florian Shkurti
ICLR 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform experimental evaluation over locomotion tasks based on the Deep Mind Control Suite framework (Tassa et al., 2018) to understand the following questions: Does LSP learn useful skills and compose them appropriately to succeed in individual tasks? Does LSP adapt to a target task with different environment reward functions quickly, after being pre-trained on another task? |
| Researcher Affiliation | Collaboration | Kevin Xie1, , Homanga Bharadhwaj1 , Danijar Hafner1,2 Animesh Garg1,3, Florian Shkurti1 1University of Toronto and Vector Institute, 2Google Brain, 3Nvidia |
| Pseudocode | Yes | Algorithm 1: Learning Skills for Planning |
| Open Source Code | No | Videos are available at: https://sites.google.com/view/latent-skill-planning/ and Video visualizations are in the website https://sites.google.com/view/partial-amortization-hierarchy/home. (These links are for videos/visualizations, not source code. No explicit statement of source code release for their method.) |
| Open Datasets | Yes | We perform experimental evaluation over locomotion tasks based on the Deep Mind Control Suite framework (Tassa et al., 2018) |
| Dataset Splits | No | The paper mentions 'training' and 'test' scenarios but does not explicitly provide specific train/validation/test dataset splits or percentages. |
| Hardware Specification | No | We thank Vector Institute Toronto for compute support. (No specific hardware models or detailed specifications are provided.) |
| Software Dependencies | No | Our method is based on the tensorflow2 implementation of Dreamer (Hafner et al., 2019) (No specific version number for TensorFlow 2 or other software.) |
| Experiment Setup | Yes | For LSP, skill vectors are 3-dimensional and are held for K = 10 steps before being updated. The CEM method has a planning horizon of H = 10, goes through Max CEMiter = 4 iterations, proposes G = 16 skills and uses the top M = 4 proposals to recompute statistics in each iteration. The additional noise ϵ added to the CEM optimized distribution is Normal(0, 0.1). |