reproducibilityindex.ai

An Imitation from Observation Approach to Transfer Learning with Dynamics Mismatch

Authors: Siddharth Desai, Ishan Durugkar, Haresh Karnan, Garrett Warnell, Josiah Hanna, Peter Stone

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We run experiments in several domains with mismatched dynamics, and ﬁnd that agents trained with GARAT achieve higher returns in the target environment compared to existing black-box transfer methods. To validate our hypothesis we derive a new algorithm generative adversarial reinforced action transformation (GARAT) based on adversarial imitation from observation techniques.
Researcher Affiliation	Collaboration	Siddarth Desai Department of Mechanical Engineering The University of Texas at Austin sidrdesai@utexas.edu Ishan Durugkar Department of Computer Science The University of Texas at Austin ishand@cs.utexas.edu Haresh Karnan Department of Mechanical Engineering The University of Texas at Austin haresh.miriyala@utexas.edu Garrett Warnell Army Research Laboratory garrett.a.warnell.civ@mail.mil Josiah P. Hanna School of Informatics The University of Edinburgh josiah.hanna@ed.ac.uk Peter Stone Department of Computer Science The University of Texas at Austin and Sony AI pstone@cs.utexas.edu
Pseudocode	Yes	Algorithm 1 lays out its details.
Open Source Code	No	The paper mentions using implementations from the 'stable-baselines library [17]' for TRPO and PPO, but does not state that its own code for GARAT or its experiments is publicly available.
Open Datasets	Yes	We validate GARAT for transfer by transferring the agent policy between Open AI Gym [7] simulated environments with different transition dynamics. For various Mu Jo Co [47] environments... Apart from the Mu Jo Co simulator, we also show successful transfer in the Py Bullet simulator [9] using the Ant domain.
Dataset Splits	No	The paper describes training policies and evaluating them in different environments or across a number of episodes, but it does not specify explicit training/validation/test dataset splits with percentages or sample counts in the way typically required for static datasets.
Hardware Specification	No	The paper does not specify the hardware used for running the experiments, such as specific CPU or GPU models, or details about computational resources.
Software Dependencies	No	The paper states 'We use the implementations of TRPO and PPO provided in the stable-baselines library [17].' but does not specify the version number for stable-baselines or any other software dependencies.
Experiment Setup	Yes	The speciﬁc hyperparameters used are provided in Appendix C.