CompILE: Compositional Imitation Learning and Execution

Authors: Thomas Kipf, Yujia Li, Hanjun Dai, Vinicius Zambaldi, Alvaro Sanchez-Gonzalez, Edward Grefenstette, Pushmeet Kohli, Peter Battaglia

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate Comp ILE in a challenging 2D multi-task environment and a continuous control task, and show that it can find correct task boundaries and event encodings in an unsupervised manner. The goals of this experimental section are as follows: 1) we would like to investigate whether our model is effective at both learning to find task boundaries and task encodings while being able to reconstruct and imitate unseen behavior, 2) test whether our modular approach to task decomposition allows our model to generalize to longer sequences with more sub-tasks at test time, and 3) investigate whether an agent can learn to control the discovered sub-task policies to quickly learn new tasks in sparse reward settings.
Researcher Affiliation Collaboration 1 Informatics Institute, University of Amsterdam, Amsterdam, The Netherlands 2 Deep Mind, London, UK 3 School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, Georgia, USA 4 Facebook AI Research, London, UK.
Pseudocode No The paper does not include a dedicated pseudocode block or algorithm listing.
Open Source Code No The paper does not provide an explicit statement or link indicating that the source code for the methodology is openly available.
Open Datasets Yes We evaluate our model in a fully-observable 2D multi-task grid world, similar to the one introduced in Oh et al. (2017) and a continuous control task... The environment is an adaptation of the single-target reacher task from the Deep Mind Control Suite (Tassa et al., 2018).
Dataset Splits No The paper describes training on demonstration trajectories and evaluation on 'newly generated instances' but does not explicitly mention a distinct validation split or set for hyperparameter tuning.
Hardware Specification No The paper mentions training on 'a single GPU' but does not provide specific hardware details such as the GPU model, CPU, or memory specifications.
Software Dependencies No The paper mentions the use of the Adam optimizer and the IMPALA algorithm, but does not specify versions for these or any other software dependencies like deep learning frameworks (e.g., TensorFlow, PyTorch).
Experiment Setup Yes Training is carried out on a single GPU with a fixed learning rate of 10 4 using the Adam (Kingma & Ba, 2015) optimizer, with a batch size of 256 and for a total of 50k training iterations (500k for reacher task).