reproducibilityindex.ai

Primal Wasserstein Imitation Learning

Authors: Robert Dadashi, Leonard Hussenot, Matthieu Geist, Olivier Pietquin

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we present the implementation of PWIL and perform an empirical evaluation, based on the ACME framework (Hoffman et al., 2020). We test PWIL on Mu Jo Co locomotion tasks and compare it to the state-of-the-art GAIL-like algorithm DAC (Kostrikov et al., 2019) and against the common BC baseline.
Researcher Affiliation	Collaboration	Robert Dadashi 1, Léonard Hussenot1,2, Matthieu Geist1, Olivier Pietquin1 1Google Research, Brain Team 2Univ. de Lille, CNRS, Inria Scool, UMR 9189 CRISt AL
Pseudocode	Yes	Algorithm 1 PWIL: Primal Wasserstein Imitation Learning
Open Source Code	Yes	We provide experimental code and videos of the trained agents here: https: //sites.google.com/view/wasserstein-imitation.
Open Datasets	Yes	We test PWIL on Mu Jo Co locomotion tasks and compare it to the state-of-the-art GAIL-like algorithm DAC (Kostrikov et al., 2019) and against the common BC baseline. As DAC is based on TD3 (Fujimoto et al., 2018) which is a variant of DDPG (Lillicrap et al., 2016), we use a DDPG-based agent for fair comparison: D4PG (Barth Maron et al., 2018). We use the door opening task from Rajeswaran et al. (2018), with human demonstrations generated with Haptix (Kumar & Todorov, 2015).
Dataset Splits	Yes	We separated the demonstrations into a train set of 20 demonstrations and a validation set of 5 demonstrations.
Hardware Specification	No	The paper does not provide specific details regarding the hardware (e.g., GPU models, CPU types, memory) used for running the experiments.
Software Dependencies	No	The paper mentions several software components and frameworks such as D4PG, ACME, Adam optimizer, POT (Flamary & Courty), DAC, SAC, and TD3. However, it does not provide specific version numbers for these software dependencies, which is required for reproducibility.
Experiment Setup	Yes	The actor architecture is a 4-layer neural network: the ﬁrst layer has size 256 with tanh activation and layer normalization (Ba et al., 2016), the second layer and third layer have size 256 with elu activation (Clevert et al., 2016), the last layer is of size the dimension of the action space, with a tanh activation scaled to the action range of the environment. To enable sufﬁcient exploration with use a Gaussian noise layer on top of the last layer with standard deviation σ = 0.2, that we clip to the action range of the environment.