reproducibilityindex.ai

OptionGAN: Learning Joint Reward-Policy Options Using Generative Adversarial Inverse Reinforcement Learning

Authors: Peter Henderson, Wei-Di Chang, Pierre-Luc Bacon, David Meger, Joelle Pineau, Doina Precup

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate Option GAN in the context of continuous control locomotion tasks, considering both simulated Mu Jo Co locomotion Open AI Gym environments (Brockman et al. 2016), modiﬁcations of these environments for task transfer (Henderson et al. 2017), and a more complex Roboschool task (Schulman et al. 2017). We show that the ﬁnal policies learned using joint reward-policy options outperform a single reward approximator and policy network in most cases, and particularly excel at one-shot transfer learning. (...) Table 1 shows the overall results of our evaluations and we highlight a subset of learning curves in Figure 3. (...) Ablation Investigations
Researcher Affiliation	Academia	Peter Henderson,1 Wei-Di Chang,2 Pierre-Luc Bacon,1 David Meger, Joelle Pineau, Doina Precup 1 1 1 1 School of Computer Science, Mc Gill University, Montreal, Canada 2 Department of Electrical, Computer, and Software Engineering, Mc Gill University, Montreal, Canada
Pseudocode	Yes	Algorithm 1: IRLGAN (...) Algorithm 2: Option GAN
Open Source Code	Yes	Code is located at: https://github.com/Breakend/OptionGAN.
Open Datasets	Yes	We use the Hopper-v1, Half Cheetah-v1, and Walker2d-v1 locomotion environments (...) Open AI Gym environments (Brockman et al. 2016) (...) Mu Jo Co simulator (Todorov, Erez, and Tassa 2012) (...) Hopper Simple Wall-v0 environment provided by the gym-extensions framework (Henderson et al. 2017) and the Roboschool Humanoid Flagrun-v1 environment used in (Schulman et al. 2017).
Dataset Splits	No	The paper mentions collecting expert rollouts and sampling trajectories for training and evaluation but does not specify explicit training, validation, and test dataset splits with percentages or sample counts for the models themselves.
Hardware Specification	No	The paper does not specify any particular hardware (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies	No	The paper mentions using "Multilayer perceptrons", "TRPO", "PPO", "Mu Jo Co simulator", and "Open AI Gym environments" but does not provide specific version numbers for any of these software components or libraries.
Experiment Setup	Yes	All shared hyperparameters are held constant between IRLGAN and Option GAN evaluation runs. All evaluations are averaged across 10 trials, each using a different random seed. (...) For simple settings all hidden layers are of size (64, 64) and for complex experiments are (128, 128). For the 2-options case we set λe = 10.0, λb = 10.0, λv = 1.0 based on a simple hyperparameter search and reported results from (Bengio et al. 2015). For the 4-options case we relax the regularizer that encourages a uniform distribution of options (Lb), setting λb = .01.