Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Learning Actionable Representations with Goal Conditioned Policies
Authors: Dibya Ghosh, Abhishek Gupta, Sergey Levine
ICLR 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our method on a number of simulated environments, and compare it to prior methods for representation learning, exploration, and hierarchical reinforcement learning. |
| Researcher Affiliation | Academia | Dibya Ghosh, Abhishek Gupta, & Sergey Levine Department of Electrical Engineering and Computer Science University of California, Berkeley Berkeley, CA 94703, USA |
| Pseudocode | No | No pseudocode or algorithm blocks are present. |
| Open Source Code | No | No information about open-source code availability is provided. |
| Open Datasets | No | We study six simulated environments as illustrated in Figure 4: 2D navigation tasks in two settings, wheeled locomotion tasks in two settings, legged locomotion, and object pushing with a robotic gripper. |
| Dataset Splits | Yes | holding out 20% of the trajectories as a validation set. |
| Hardware Specification | No | No specific hardware details (like GPU/CPU models) are provided. "computational resources from Amazon" is too vague. |
| Software Dependencies | No | The paper mentions algorithms and optimizers (TRPO, Adam) but does not provide specific software dependencies with version numbers. |
| Experiment Setup | Yes | The mean, ยตฮธ( , ) is a fully-connected neural network which takes in the state and the desired goal state as a concatenated vector, and has three hidden layers containing 150, 100, and 50 units respectively. ฮฃ is a learned diagonal covariance matrix, and is initially set to ฮฃ = I. |