reproducibilityindex.ai

End-to-End Differentiable Adversarial Imitation Learning

Authors: Nir Baram, Oron Anschel, Itai Caspi, Shie Mannor

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We test it on both discrete and continuous action domains and report results that surpass the state-of-the-art. We evaluate the proposed algorithm on three discrete control tasks (Cartpole, Mountain-Car, Acrobot), and ﬁve continuous control tasks (Hopper, Walker, Half-Cheetah, Ant, and Humanoid) modeled by the Mu Jo Co physics simulator (Todorov et al., 2012).
Researcher Affiliation	Academia	Nir Baram 1 Oron Anschel 1 Itai Caspi 1 Shie Mannor 1 1Technion Institute of Technology, Israel.
Pseudocode	Yes	Algorithm 1 Model-based Generative Adversarial Imitation Learning
Open Source Code	No	The paper does not provide an explicit statement or link to open-source code for the proposed methodology.
Open Datasets	Yes	We evaluate the proposed algorithm on three discrete control tasks (Cartpole, Mountain-Car, Acrobot), and ﬁve continuous control tasks (Hopper, Walker, Half-Cheetah, Ant, and Humanoid) modeled by the Mu Jo Co physics simulator (Todorov et al., 2012).
Dataset Splits	No	The paper describes generating trajectories but does not specify explicit train/validation/test dataset splits for reproducibility.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running experiments.
Software Dependencies	No	The paper mentions using TRPO, MuJoCo, and ADAM optimizer, but does not provide specific version numbers for any software dependencies.
Experiment Setup	Yes	The discriminator and policy neural networks are built from two hidden layers with Relu non-linearity and are trained using the ADAM optimizer (Kingma & Ba, 2014). For each task, we produce datasets with a different number of trajectories, where each trajectory: τ = {s0, s1, ...s N, a N} is of length N = 1000. We found empirically that using a Hadamard product to combine the encoded state and action achieves the best performance. Additionally, predicting the next state based on the current state alone requires the environment to be representable as a ﬁrst order MDP. Instead, we can assume the environment to be representable as an n th order MDP and use multiple previous states to predict the next state. To model the multi-step dependencies, we use a recurrent connection from the previous state by incorporating a GRU layer (Cho et al., 2014) as part of the state encoder.