reproducibilityindex.ai

Automatic Goal Generation for Reinforcement Learning Agents

Authors: Carlos Florensa, David Held, Xinyang Geng, Pieter Abbeel

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section we provide the experimental results to answer the following questions: Does our automatic curriculum yield faster maximization of the coverage objective? Does our Goal GAN dynamically shift to sample goals of the appropriate difﬁculty (i.e. in GOIDi)? Can our Goal GAN track complex multimodal goal distributions GOIDi? Does it scale to higher-dimensional goal-spaces with a low-dimensional space of feasible goals? To answer the ﬁrst two questions, we demonstrate our method in two challenging robotic locomotion tasks, where the goals are the (x, y) position of the Center of Mass (Co M) of a dynamically complex quadruped agent. In the ﬁrst experiment the agent has no constraints (see Fig. 1a) and in the second one the agent is inside a U-maze (see Fig. 1b). To answer the third question, we train a point-mass agent to reach any point within a multi-path maze (see Fig. 1d). To answer the ﬁnal question, we study how our method scales with the dimension of the goal-space in an environment where the feasible region is kept of approximately constant volume in an embedding space that grows in dimension (see Fig. 1c for the 3D case). We compare our Goal GAN method against four baselines.
Researcher Affiliation	Academia	Carlos Florensa * 1 David Held * 2 Xinyang Geng * 1 Pieter Abbeel 1 3 1Department of Computer Science, UC Berkeley 2Department of Computer Science, CMU 3International Computer Science Institute (ICSI). Correspondence to: Carlos Florensa <ﬂorensa@berkeley.edu>, David Held <dheld@andrew.cmu.edu>.
Pseudocode	Yes	Algorithm 1 Generative Goal Learning; Algorithm 2 Generative Goal with Sagg-RIAC
Open Source Code	Yes	Videos and code available at: https://sites.google.com/view/ goalgeneration4rl.
Open Datasets	No	The paper describes custom environments (Ant Locomotion, Point-mass, N-dimensional Point Mass) and how data is generated through simulation, but does not provide access information (link, citation, or repository) for the specific datasets used in their experiments. It references Mujoco for the environment, but not for data availability.
Dataset Splits	No	The paper mentions using a "test distribution of goals" but does not specify explicit train/validation/test dataset splits with percentages, sample counts, or citations to predefined splits for a static dataset. It describes a dynamic goal sampling process rather than fixed data partitioning.
Hardware Specification	No	The paper does not specify the hardware (e.g., CPU, GPU models, memory, cloud instances) used for running the experiments. It mentions the simulation environment Mujoco, but not the computational resources.
Software Dependencies	No	The paper mentions software like Mujoco and rllab, and algorithms like TRPO with GAE, but does not provide specific version numbers for these components, which are necessary for full reproducibility.
Experiment Setup	Yes	At each step of the algorithm, we train the policy for 5 iterations, each of which consists of 100 episodes. After 5 policy iterations, we then train the GAN for 200 iterations, each of which consists of 1 iteration of training the discriminator and 1 iteration of training the generator. The generator receives as input 4 dimensional noise sampled from the standard normal distribution. The goal generator consists of two hidden layers with 128 nodes, and the goal discriminator consists of two hidden layers with 256 nodes, with relu nonlinearities. The policy is deﬁned by a neural network which receives as input the goal appended to the agent observations described above. The inputs are sent to two hidden layers of size 32 with tanh nonlinearities. For policy optimization, we use a discount factor of 0.998 and a GAE lambda of 0.995. Every update policy consists of 5 iterations of this algorithm.