Primal Wasserstein Imitation Learning
Authors: Robert Dadashi, Leonard Hussenot, Matthieu Geist, Olivier Pietquin
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we present the implementation of PWIL and perform an empirical evaluation, based on the ACME framework (Hoffman et al., 2020). We test PWIL on Mu Jo Co locomotion tasks and compare it to the state-of-the-art GAIL-like algorithm DAC (Kostrikov et al., 2019) and against the common BC baseline. |
| Researcher Affiliation | Collaboration | Robert Dadashi 1, Léonard Hussenot1,2, Matthieu Geist1, Olivier Pietquin1 1Google Research, Brain Team 2Univ. de Lille, CNRS, Inria Scool, UMR 9189 CRISt AL |
| Pseudocode | Yes | Algorithm 1 PWIL: Primal Wasserstein Imitation Learning |
| Open Source Code | Yes | We provide experimental code and videos of the trained agents here: https: //sites.google.com/view/wasserstein-imitation. |
| Open Datasets | Yes | We test PWIL on Mu Jo Co locomotion tasks and compare it to the state-of-the-art GAIL-like algorithm DAC (Kostrikov et al., 2019) and against the common BC baseline. As DAC is based on TD3 (Fujimoto et al., 2018) which is a variant of DDPG (Lillicrap et al., 2016), we use a DDPG-based agent for fair comparison: D4PG (Barth Maron et al., 2018). We use the door opening task from Rajeswaran et al. (2018), with human demonstrations generated with Haptix (Kumar & Todorov, 2015). |
| Dataset Splits | Yes | We separated the demonstrations into a train set of 20 demonstrations and a validation set of 5 demonstrations. |
| Hardware Specification | No | The paper does not provide specific details regarding the hardware (e.g., GPU models, CPU types, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions several software components and frameworks such as D4PG, ACME, Adam optimizer, POT (Flamary & Courty), DAC, SAC, and TD3. However, it does not provide specific version numbers for these software dependencies, which is required for reproducibility. |
| Experiment Setup | Yes | The actor architecture is a 4-layer neural network: the first layer has size 256 with tanh activation and layer normalization (Ba et al., 2016), the second layer and third layer have size 256 with elu activation (Clevert et al., 2016), the last layer is of size the dimension of the action space, with a tanh activation scaled to the action range of the environment. To enable sufficient exploration with use a Gaussian noise layer on top of the last layer with standard deviation σ = 0.2, that we clip to the action range of the environment. |