CompILE: Compositional Imitation Learning and Execution
Authors: Thomas Kipf, Yujia Li, Hanjun Dai, Vinicius Zambaldi, Alvaro Sanchez-Gonzalez, Edward Grefenstette, Pushmeet Kohli, Peter Battaglia
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate Comp ILE in a challenging 2D multi-task environment and a continuous control task, and show that it can find correct task boundaries and event encodings in an unsupervised manner. The goals of this experimental section are as follows: 1) we would like to investigate whether our model is effective at both learning to find task boundaries and task encodings while being able to reconstruct and imitate unseen behavior, 2) test whether our modular approach to task decomposition allows our model to generalize to longer sequences with more sub-tasks at test time, and 3) investigate whether an agent can learn to control the discovered sub-task policies to quickly learn new tasks in sparse reward settings. |
| Researcher Affiliation | Collaboration | 1 Informatics Institute, University of Amsterdam, Amsterdam, The Netherlands 2 Deep Mind, London, UK 3 School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, Georgia, USA 4 Facebook AI Research, London, UK. |
| Pseudocode | No | The paper does not include a dedicated pseudocode block or algorithm listing. |
| Open Source Code | No | The paper does not provide an explicit statement or link indicating that the source code for the methodology is openly available. |
| Open Datasets | Yes | We evaluate our model in a fully-observable 2D multi-task grid world, similar to the one introduced in Oh et al. (2017) and a continuous control task... The environment is an adaptation of the single-target reacher task from the Deep Mind Control Suite (Tassa et al., 2018). |
| Dataset Splits | No | The paper describes training on demonstration trajectories and evaluation on 'newly generated instances' but does not explicitly mention a distinct validation split or set for hyperparameter tuning. |
| Hardware Specification | No | The paper mentions training on 'a single GPU' but does not provide specific hardware details such as the GPU model, CPU, or memory specifications. |
| Software Dependencies | No | The paper mentions the use of the Adam optimizer and the IMPALA algorithm, but does not specify versions for these or any other software dependencies like deep learning frameworks (e.g., TensorFlow, PyTorch). |
| Experiment Setup | Yes | Training is carried out on a single GPU with a fixed learning rate of 10 4 using the Adam (Kingma & Ba, 2015) optimizer, with a batch size of 256 and for a total of 50k training iterations (500k for reacher task). |