Fast Imitation via Behavior Foundation Models
Authors: Matteo Pirotta, Andrea Tirinzoni, Ahmed Touati, Alessandro Lazaric, Yann Ollivier
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We test FB-IL algorithms across environments from the Deep Mind Control Suite (Tassa et al., 2018a) with multiple imitation tasks, using different IL principles and settings. We show that not only do FB-IL algorithms perform on-par or better than the corresponding state-of-the-art offline imitation learning baselines (Fig. 1), they also solve imitation tasks within a few seconds, which is three orders of magnitude faster than offline IL methods that need to run full RL routines to compute an imitation policy (Fig. 2). |
| Researcher Affiliation | Industry | Matteo Pirotta*, Andrea Tirinzoni* & Ahmed Touati* Fundamental AI Research at Meta {pirotta,tirinzoni,atouati}@meta.com Alessandro Lazaric & Yann Ollivier Fundamental AI Research at Meta {lazaric,yol}@meta.com |
| Pseudocode | No | The paper describes mathematical formulations and algorithmic steps in prose, but it does not include any explicit pseudocode blocks or algorithm listings. |
| Open Source Code | No | The paper does not contain an explicit statement or a link providing access to the source code for the methodology described in the paper. |
| Open Datasets | Yes | We used standard unsupervised datasets for the four domains, generated by Random Network Distillation (RND). They can be downloaded following the instructions in the github repository of Yarats et al. (2022) (https://github.com/denisyarats/exorl). ... All the environments considered in this paper are based on the Deep Mind Control Suite (Tassa et al., 2018b). |
| Dataset Splits | No | The paper describes how expert trajectories are generated and used for imitation, and how models are pre-trained and evaluated, but it does not specify explicit train/validation/test dataset splits (e.g., percentages or counts for distinct sets). |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used to run the experiments. |
| Software Dependencies | No | The paper mentions software components and methods like TD3, SAC, RND, and Deep Mind Control Suite, but it does not provide specific version numbers for any of the key software libraries or dependencies used (e.g., Python, PyTorch, TensorFlow). |
| Experiment Setup | Yes | D.4 HYPERPARAMETERS: Table 1: Hyperparameters used for FB pretraining. Table 2: Hyperparameters used for IL baselines. Table 3: Hyperparameters used for DIAYN and GOAL-TD3. Table 4: Hyperparameters used for GOAL-GPT and MASKDP. |