Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Dynamical Distance Learning for Semi-Supervised and Unsupervised Skill Discovery
Authors: Kristian Hartikainen, Xinyang Geng, Tuomas Haarnoja, Sergey Levine
ICLR 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our method both on a real-world robot and in simulation. |
| Researcher Affiliation | Collaboration | Kristian Hartikainen University of California, Berkeley University of Oxford Xinyang Geng University of California, Berkeley Tuomas Haarnoja University of California, Berkeley Google Deep Mind Sergey Levine University of California, Berkeley |
| Pseudocode | Yes | Algorithm 1 Dynamical Distance Learning |
| Open Source Code | No | The project website https://sites.google.com/view/dynamical-distance-learning is mentioned for videos, but there is no explicit statement that the source code for the methodology is available there or elsewhere. |
| Open Datasets | Yes | We consider a 9-Do F real-world dexterous manipulation task and 4 standard Open AI Gym tasks (Hopper-v3, Half Cheetah-v3, Ant-v3, and Inverted Double Pendulumv2). |
| Dataset Splits | No | The paper mentions using standard Open AI Gym tasks but does not specify the train/validation/test splits, their percentages, or how they were derived for these datasets or the real-world task. |
| Hardware Specification | No | The paper mentions a 'real-world 9-Do F hand' (DClaw) for the manipulation task, but does not provide specific details about the computing hardware (e.g., GPU/CPU models, memory) used for training or running experiments. |
| Software Dependencies | No | The paper states 'All our experiments use Soft Actor-Critic as the policy optimizer, trained the default parameters by provided by the authors in (Haarnoja et al., 2018c).' However, it does not provide specific version numbers for Soft Actor-Critic or any other software libraries or dependencies. |
| Experiment Setup | Yes | Most important hyperparameters that we swept over in the ο¬nal experiments, namely the size of the on-policy pool for training the distance function and the number of gradient steps per environment samples, are presented in Table 1 below: Environment gradient steps per environment steps on-policy pool size Inverted Double Pendulum-v2 1/64 100k Hopper-v3 1/64 16k Half Cheetah-v3 1/16 16k Ant-v3 1/64 10k DClaw (both state and vision) 1/16 100k |