Hyperparameter Selection for Imitation Learning

Authors: Léonard Hussenot, Marcin Andrychowicz, Damien Vincent, Robert Dadashi, Anton Raichuk, Sabela Ramos, Nikola Momchev, Sertan Girgin, Raphael Marinier, Lukasz Stafiniak, Manu Orsini, Olivier Bachem, Matthieu Geist, Olivier Pietquin

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate them in an extensive empirical study (more than 10 000 agents across 9 environments) and make practical recommendations for selecting HPs.
Researcher Affiliation Collaboration 1Google Research, Brain Team 2Univ. de Lille, CNRS, Inria Scool, UMR 9189 CRISt AL.
Pseudocode No The paper describes the algorithms verbally but does not include any pseudocode or algorithm blocks.
Open Source Code No The paper states, "We implemented our algorithms in the Acme framework (Hoffman et al., 2020) using JAX (Bradbury et al., 2018) for automatic differentiation and Flax (Heek et al., 2020) for neural networks computation." However, it does not provide a specific link or explicit statement about releasing the source code for their specific implementation or experiments.
Open Datasets Yes We focus on continuous-control benchmarks and consider five widely used environments from Open AI Gym (Brockman et al., 2016): Hopper-v2, Walker2d-v2, Half Cheetah-v2, Ant-v2, and Humanoid-v2 and four manipulation tasks from Adroit (Kumar, 2016): pen-v0, relocate-v0, door-v0, and hammer-v0. (...) For the Adroit environments, we use the expert datasets from the D4RL dataset (Fu et al., 2020).
Dataset Splits Yes For the Open AI Gym environments, we use 11 training trajectories and keep 5 additional held-out trajectories for validation. For the Adroit environments, 20 training trajectories are used as well as 5 validation trajectories.
Hardware Specification No The paper does not provide specific details about the hardware used, such as GPU models, CPU types, or memory specifications. It only mentions training "thousands of agents".
Software Dependencies No The paper mentions using "Acme framework (Hoffman et al., 2020) using JAX (Bradbury et al., 2018) for automatic differentiation and Flax (Heek et al., 2020) for neural networks computation" and "Adam optimizer (Kingma & Ba, 2014)". However, it does not specify exact version numbers for these software components (e.g., JAX 0.3.17, Flax 0.5.0).
Experiment Setup Yes Online algorithms (AIL & PWIL) are run for 1M environment steps while BC is trained for 60k gradient steps. (...) All models are trained with a learning rate of 3e-4 with Adam optimizer. (...) The SAC agent uses a learning rate of 3e-4 (...) a discount factor of 0.99, a target entropy of -6, and an Adam optimizer. We used a replay buffer of size 10M.