Latent Skill Planning for Exploration and Transfer

Authors: Kevin Xie, Homanga Bharadhwaj, Danijar Hafner, Animesh Garg, Florian Shkurti

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We perform experimental evaluation over locomotion tasks based on the Deep Mind Control Suite framework (Tassa et al., 2018) to understand the following questions: Does LSP learn useful skills and compose them appropriately to succeed in individual tasks? Does LSP adapt to a target task with different environment reward functions quickly, after being pre-trained on another task?
Researcher Affiliation Collaboration Kevin Xie1, , Homanga Bharadhwaj1 , Danijar Hafner1,2 Animesh Garg1,3, Florian Shkurti1 1University of Toronto and Vector Institute, 2Google Brain, 3Nvidia
Pseudocode Yes Algorithm 1: Learning Skills for Planning
Open Source Code No Videos are available at: https://sites.google.com/view/latent-skill-planning/ and Video visualizations are in the website https://sites.google.com/view/partial-amortization-hierarchy/home. (These links are for videos/visualizations, not source code. No explicit statement of source code release for their method.)
Open Datasets Yes We perform experimental evaluation over locomotion tasks based on the Deep Mind Control Suite framework (Tassa et al., 2018)
Dataset Splits No The paper mentions 'training' and 'test' scenarios but does not explicitly provide specific train/validation/test dataset splits or percentages.
Hardware Specification No We thank Vector Institute Toronto for compute support. (No specific hardware models or detailed specifications are provided.)
Software Dependencies No Our method is based on the tensorflow2 implementation of Dreamer (Hafner et al., 2019) (No specific version number for TensorFlow 2 or other software.)
Experiment Setup Yes For LSP, skill vectors are 3-dimensional and are held for K = 10 steps before being updated. The CEM method has a planning horizon of H = 10, goes through Max CEMiter = 4 iterations, proposes G = 16 skills and uses the top M = 4 proposals to recompute statistics in each iteration. The additional noise ϵ added to the CEM optimized distribution is Normal(0, 0.1).