Primal Wasserstein Imitation Learning

Authors: Robert Dadashi, Leonard Hussenot, Matthieu Geist, Olivier Pietquin

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we present the implementation of PWIL and perform an empirical evaluation, based on the ACME framework (Hoffman et al., 2020). We test PWIL on Mu Jo Co locomotion tasks and compare it to the state-of-the-art GAIL-like algorithm DAC (Kostrikov et al., 2019) and against the common BC baseline.
Researcher Affiliation Collaboration Robert Dadashi 1, Léonard Hussenot1,2, Matthieu Geist1, Olivier Pietquin1 1Google Research, Brain Team 2Univ. de Lille, CNRS, Inria Scool, UMR 9189 CRISt AL
Pseudocode Yes Algorithm 1 PWIL: Primal Wasserstein Imitation Learning
Open Source Code Yes We provide experimental code and videos of the trained agents here: https: //sites.google.com/view/wasserstein-imitation.
Open Datasets Yes We test PWIL on Mu Jo Co locomotion tasks and compare it to the state-of-the-art GAIL-like algorithm DAC (Kostrikov et al., 2019) and against the common BC baseline. As DAC is based on TD3 (Fujimoto et al., 2018) which is a variant of DDPG (Lillicrap et al., 2016), we use a DDPG-based agent for fair comparison: D4PG (Barth Maron et al., 2018). We use the door opening task from Rajeswaran et al. (2018), with human demonstrations generated with Haptix (Kumar & Todorov, 2015).
Dataset Splits Yes We separated the demonstrations into a train set of 20 demonstrations and a validation set of 5 demonstrations.
Hardware Specification No The paper does not provide specific details regarding the hardware (e.g., GPU models, CPU types, memory) used for running the experiments.
Software Dependencies No The paper mentions several software components and frameworks such as D4PG, ACME, Adam optimizer, POT (Flamary & Courty), DAC, SAC, and TD3. However, it does not provide specific version numbers for these software dependencies, which is required for reproducibility.
Experiment Setup Yes The actor architecture is a 4-layer neural network: the first layer has size 256 with tanh activation and layer normalization (Ba et al., 2016), the second layer and third layer have size 256 with elu activation (Clevert et al., 2016), the last layer is of size the dimension of the action space, with a tanh activation scaled to the action range of the environment. To enable sufficient exploration with use a Gaussian noise layer on top of the last layer with standard deviation σ = 0.2, that we clip to the action range of the environment.