Dynamical Distance Learning for Semi-Supervised and Unsupervised Skill Discovery

Authors: Kristian Hartikainen, Xinyang Geng, Tuomas Haarnoja, Sergey Levine

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our method both on a real-world robot and in simulation.
Researcher Affiliation Collaboration Kristian Hartikainen University of California, Berkeley University of Oxford Xinyang Geng University of California, Berkeley Tuomas Haarnoja University of California, Berkeley Google Deep Mind Sergey Levine University of California, Berkeley
Pseudocode Yes Algorithm 1 Dynamical Distance Learning
Open Source Code No The project website https://sites.google.com/view/dynamical-distance-learning is mentioned for videos, but there is no explicit statement that the source code for the methodology is available there or elsewhere.
Open Datasets Yes We consider a 9-Do F real-world dexterous manipulation task and 4 standard Open AI Gym tasks (Hopper-v3, Half Cheetah-v3, Ant-v3, and Inverted Double Pendulumv2).
Dataset Splits No The paper mentions using standard Open AI Gym tasks but does not specify the train/validation/test splits, their percentages, or how they were derived for these datasets or the real-world task.
Hardware Specification No The paper mentions a 'real-world 9-Do F hand' (DClaw) for the manipulation task, but does not provide specific details about the computing hardware (e.g., GPU/CPU models, memory) used for training or running experiments.
Software Dependencies No The paper states 'All our experiments use Soft Actor-Critic as the policy optimizer, trained the default parameters by provided by the authors in (Haarnoja et al., 2018c).' However, it does not provide specific version numbers for Soft Actor-Critic or any other software libraries or dependencies.
Experiment Setup Yes Most important hyperparameters that we swept over in the final experiments, namely the size of the on-policy pool for training the distance function and the number of gradient steps per environment samples, are presented in Table 1 below: Environment gradient steps per environment steps on-policy pool size Inverted Double Pendulum-v2 1/64 100k Hopper-v3 1/64 16k Half Cheetah-v3 1/16 16k Ant-v3 1/64 10k DClaw (both state and vision) 1/16 100k