Hyperparameter Selection for Imitation Learning
Authors: Léonard Hussenot, Marcin Andrychowicz, Damien Vincent, Robert Dadashi, Anton Raichuk, Sabela Ramos, Nikola Momchev, Sertan Girgin, Raphael Marinier, Lukasz Stafiniak, Manu Orsini, Olivier Bachem, Matthieu Geist, Olivier Pietquin
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate them in an extensive empirical study (more than 10 000 agents across 9 environments) and make practical recommendations for selecting HPs. |
| Researcher Affiliation | Collaboration | 1Google Research, Brain Team 2Univ. de Lille, CNRS, Inria Scool, UMR 9189 CRISt AL. |
| Pseudocode | No | The paper describes the algorithms verbally but does not include any pseudocode or algorithm blocks. |
| Open Source Code | No | The paper states, "We implemented our algorithms in the Acme framework (Hoffman et al., 2020) using JAX (Bradbury et al., 2018) for automatic differentiation and Flax (Heek et al., 2020) for neural networks computation." However, it does not provide a specific link or explicit statement about releasing the source code for their specific implementation or experiments. |
| Open Datasets | Yes | We focus on continuous-control benchmarks and consider five widely used environments from Open AI Gym (Brockman et al., 2016): Hopper-v2, Walker2d-v2, Half Cheetah-v2, Ant-v2, and Humanoid-v2 and four manipulation tasks from Adroit (Kumar, 2016): pen-v0, relocate-v0, door-v0, and hammer-v0. (...) For the Adroit environments, we use the expert datasets from the D4RL dataset (Fu et al., 2020). |
| Dataset Splits | Yes | For the Open AI Gym environments, we use 11 training trajectories and keep 5 additional held-out trajectories for validation. For the Adroit environments, 20 training trajectories are used as well as 5 validation trajectories. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used, such as GPU models, CPU types, or memory specifications. It only mentions training "thousands of agents". |
| Software Dependencies | No | The paper mentions using "Acme framework (Hoffman et al., 2020) using JAX (Bradbury et al., 2018) for automatic differentiation and Flax (Heek et al., 2020) for neural networks computation" and "Adam optimizer (Kingma & Ba, 2014)". However, it does not specify exact version numbers for these software components (e.g., JAX 0.3.17, Flax 0.5.0). |
| Experiment Setup | Yes | Online algorithms (AIL & PWIL) are run for 1M environment steps while BC is trained for 60k gradient steps. (...) All models are trained with a learning rate of 3e-4 with Adam optimizer. (...) The SAC agent uses a learning rate of 3e-4 (...) a discount factor of 0.99, a target entropy of -6, and an Adam optimizer. We used a replay buffer of size 10M. |