reproducibilityindex.ai

Goal-conditioned Imitation Learning

Authors: Yiming Ding, Carlos Florensa, Pieter Abbeel, Mariano Phielipp

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate these questions in four different simulated robotic goal-conditioned tasks that are detailed in the next subsection along with the performance metric used throughout the experiments section. All the results use 20 demonstrations reaching uniformly sampled goals. All curves have 5 random seeds and the shaded area is one standard deviation.
Researcher Affiliation	Collaboration	Yiming Ding Department of Computer Science University of California, Berkeley dingyiming0427@berkeley.edu Carlos Florensa Department of Computer Science University of California, Berkeley florensa@berkeley.edu Mariano Phielipp Intel AI Labs mariano.j.phielipp@intel.com Pieter Abbeel Department of Computer Science University of California, Berkeley pabbeel@berkeley.edu
Pseudocode	Yes	Algorithm 1 Goal-conditioned GAIL with Hindsight: goal GAIL
Open Source Code	Yes	Our code is open-source 2. 2https://sites.google.com/view/goalconditioned-il/
Open Datasets	No	Experiments are conducted in four continuous environments in Mu Jo Co [41]. The paper describes custom simulated environments rather than referring to or providing access to a pre-existing public dataset.
Dataset Splits	No	The paper does not specify explicit training/validation/test dataset splits, percentages, or absolute sample counts for data used in experiments.
Hardware Specification	No	The paper does not mention any specific hardware specifications (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies	No	The paper mentions
Experiment Setup	Yes	In the four environments used in our experiments, i.e. Four Rooms environment, Fetch Pick & Place, Pointmass block pusher and Fetch Stack Two, the task horizons are set to 300, 100, 100 and 150 respectively. The discount factors are γ = 1 1 H . In all experiments, the Q function, policy and discriminator are paramaterized by fully connected neural networks with two hidden layers of size 256. DDPG is used for policy optimization and hindsight probability is set to p = 0.8. The initial value of the behavior cloning loss weight β is set to 0.1 and is annealed by 0.9 per 250 rollouts collected. The initial value of the discriminator reward weight δGAIL is set to 0.1.