reproducibilityindex.ai

Latent Skill Planning for Exploration and Transfer

Authors: Kevin Xie, Homanga Bharadhwaj, Danijar Hafner, Animesh Garg, Florian Shkurti

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We perform experimental evaluation over locomotion tasks based on the Deep Mind Control Suite framework (Tassa et al., 2018) to understand the following questions: Does LSP learn useful skills and compose them appropriately to succeed in individual tasks? Does LSP adapt to a target task with different environment reward functions quickly, after being pre-trained on another task?
Researcher Affiliation	Collaboration	Kevin Xie1, , Homanga Bharadhwaj1 , Danijar Hafner1,2 Animesh Garg1,3, Florian Shkurti1 1University of Toronto and Vector Institute, 2Google Brain, 3Nvidia
Pseudocode	Yes	Algorithm 1: Learning Skills for Planning
Open Source Code	No	Videos are available at: https://sites.google.com/view/latent-skill-planning/ and Video visualizations are in the website https://sites.google.com/view/partial-amortization-hierarchy/home. (These links are for videos/visualizations, not source code. No explicit statement of source code release for their method.)
Open Datasets	Yes	We perform experimental evaluation over locomotion tasks based on the Deep Mind Control Suite framework (Tassa et al., 2018)
Dataset Splits	No	The paper mentions 'training' and 'test' scenarios but does not explicitly provide specific train/validation/test dataset splits or percentages.
Hardware Specification	No	We thank Vector Institute Toronto for compute support. (No specific hardware models or detailed specifications are provided.)
Software Dependencies	No	Our method is based on the tensorﬂow2 implementation of Dreamer (Hafner et al., 2019) (No specific version number for TensorFlow 2 or other software.)
Experiment Setup	Yes	For LSP, skill vectors are 3-dimensional and are held for K = 10 steps before being updated. The CEM method has a planning horizon of H = 10, goes through Max CEMiter = 4 iterations, proposes G = 16 skills and uses the top M = 4 proposals to recompute statistics in each iteration. The additional noise ϵ added to the CEM optimized distribution is Normal(0, 0.1).