reproducibilityindex.ai

Robust Adversarial Reinforcement Learning

Authors: Lerrel Pinto, James Davidson, Rahul Sukthankar, Abhinav Gupta

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments in multiple environments (Inverted Pendulum, Half Cheetah, Swimmer, Hopper, Walker2d and Ant) conclusively demonstrate that our method (a) improves training stability; (b) is robust to differences in training/test conditions; and c) outperform the baseline even in the absence of the adversary.
Researcher Affiliation	Collaboration	1Carnegie Mellon University 2Google Brain 3Google Research. Correspondence to: Lerrel Pinto <lerrelp@cs.cmu.edu>.
Pseudocode	Yes	Algorithm 1 RARL (proposed algorithm)
Open Source Code	No	The paper does not provide any specific links to source code repositories, nor does it explicitly state that the code for the described methodology is being released or is publicly available.
Open Datasets	Yes	Our implementation of the adversarial environments build on Open AI Gym s (Brockman et al., 2016) control environments with the Mu Jo Co (Todorov et al., 2012) physics simulator.
Dataset Splits	No	The paper does not explicitly provide specific percentages or counts for training, validation, and test dataset splits, nor does it refer to predefined splits with citations for reproducibility of data partitioning.
Hardware Specification	No	The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running its experiments, only mentioning the use of the Mu Jo Co physics simulator.
Software Dependencies	No	The paper mentions software like 'rllab', 'Trust Region Policy Optimization (TRPO)', 'Open AI Gym', and 'Mu Jo Co', but it does not provide specific version numbers for these software components, which are required for a reproducible description of ancillary software.
Experiment Setup	Yes	For all the tasks and for both the protagonist and adversary, we use a policy network with two hidden layers with 64 neurons each. We train both RARL and the baseline for 100 iterations on Inverted Pendulum and for 500 iterations on the other tasks. Hyperparameters of TRPO are selected by grid search.