Dynamical Distance Learning for Semi-Supervised and Unsupervised Skill Discovery
Authors: Kristian Hartikainen, Xinyang Geng, Tuomas Haarnoja, Sergey Levine
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our method both on a real-world robot and in simulation. |
| Researcher Affiliation | Collaboration | Kristian Hartikainen University of California, Berkeley University of Oxford Xinyang Geng University of California, Berkeley Tuomas Haarnoja University of California, Berkeley Google Deep Mind Sergey Levine University of California, Berkeley |
| Pseudocode | Yes | Algorithm 1 Dynamical Distance Learning |
| Open Source Code | No | The project website https://sites.google.com/view/dynamical-distance-learning is mentioned for videos, but there is no explicit statement that the source code for the methodology is available there or elsewhere. |
| Open Datasets | Yes | We consider a 9-Do F real-world dexterous manipulation task and 4 standard Open AI Gym tasks (Hopper-v3, Half Cheetah-v3, Ant-v3, and Inverted Double Pendulumv2). |
| Dataset Splits | No | The paper mentions using standard Open AI Gym tasks but does not specify the train/validation/test splits, their percentages, or how they were derived for these datasets or the real-world task. |
| Hardware Specification | No | The paper mentions a 'real-world 9-Do F hand' (DClaw) for the manipulation task, but does not provide specific details about the computing hardware (e.g., GPU/CPU models, memory) used for training or running experiments. |
| Software Dependencies | No | The paper states 'All our experiments use Soft Actor-Critic as the policy optimizer, trained the default parameters by provided by the authors in (Haarnoja et al., 2018c).' However, it does not provide specific version numbers for Soft Actor-Critic or any other software libraries or dependencies. |
| Experiment Setup | Yes | Most important hyperparameters that we swept over in the final experiments, namely the size of the on-policy pool for training the distance function and the number of gradient steps per environment samples, are presented in Table 1 below: Environment gradient steps per environment steps on-policy pool size Inverted Double Pendulum-v2 1/64 100k Hopper-v3 1/64 16k Half Cheetah-v3 1/16 16k Ant-v3 1/64 10k DClaw (both state and vision) 1/16 100k |