reproducibilityindex.ai

Dynamical Distance Learning for Semi-Supervised and Unsupervised Skill Discovery

Authors: Kristian Hartikainen, Xinyang Geng, Tuomas Haarnoja, Sergey Levine

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our method both on a real-world robot and in simulation.
Researcher Affiliation	Collaboration	Kristian Hartikainen University of California, Berkeley University of Oxford Xinyang Geng University of California, Berkeley Tuomas Haarnoja University of California, Berkeley Google Deep Mind Sergey Levine University of California, Berkeley
Pseudocode	Yes	Algorithm 1 Dynamical Distance Learning
Open Source Code	No	The project website https://sites.google.com/view/dynamical-distance-learning is mentioned for videos, but there is no explicit statement that the source code for the methodology is available there or elsewhere.
Open Datasets	Yes	We consider a 9-Do F real-world dexterous manipulation task and 4 standard Open AI Gym tasks (Hopper-v3, Half Cheetah-v3, Ant-v3, and Inverted Double Pendulumv2).
Dataset Splits	No	The paper mentions using standard Open AI Gym tasks but does not specify the train/validation/test splits, their percentages, or how they were derived for these datasets or the real-world task.
Hardware Specification	No	The paper mentions a 'real-world 9-Do F hand' (DClaw) for the manipulation task, but does not provide specific details about the computing hardware (e.g., GPU/CPU models, memory) used for training or running experiments.
Software Dependencies	No	The paper states 'All our experiments use Soft Actor-Critic as the policy optimizer, trained the default parameters by provided by the authors in (Haarnoja et al., 2018c).' However, it does not provide specific version numbers for Soft Actor-Critic or any other software libraries or dependencies.
Experiment Setup	Yes	Most important hyperparameters that we swept over in the ﬁnal experiments, namely the size of the on-policy pool for training the distance function and the number of gradient steps per environment samples, are presented in Table 1 below: Environment gradient steps per environment steps on-policy pool size Inverted Double Pendulum-v2 1/64 100k Hopper-v3 1/64 16k Half Cheetah-v3 1/16 16k Ant-v3 1/64 10k DClaw (both state and vision) 1/16 100k