reproducibilityindex.ai

CompILE: Compositional Imitation Learning and Execution

Authors: Thomas Kipf, Yujia Li, Hanjun Dai, Vinicius Zambaldi, Alvaro Sanchez-Gonzalez, Edward Grefenstette, Pushmeet Kohli, Peter Battaglia

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate Comp ILE in a challenging 2D multi-task environment and a continuous control task, and show that it can ﬁnd correct task boundaries and event encodings in an unsupervised manner. The goals of this experimental section are as follows: 1) we would like to investigate whether our model is effective at both learning to ﬁnd task boundaries and task encodings while being able to reconstruct and imitate unseen behavior, 2) test whether our modular approach to task decomposition allows our model to generalize to longer sequences with more sub-tasks at test time, and 3) investigate whether an agent can learn to control the discovered sub-task policies to quickly learn new tasks in sparse reward settings.
Researcher Affiliation	Collaboration	1 Informatics Institute, University of Amsterdam, Amsterdam, The Netherlands 2 Deep Mind, London, UK 3 School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, Georgia, USA 4 Facebook AI Research, London, UK.
Pseudocode	No	The paper does not include a dedicated pseudocode block or algorithm listing.
Open Source Code	No	The paper does not provide an explicit statement or link indicating that the source code for the methodology is openly available.
Open Datasets	Yes	We evaluate our model in a fully-observable 2D multi-task grid world, similar to the one introduced in Oh et al. (2017) and a continuous control task... The environment is an adaptation of the single-target reacher task from the Deep Mind Control Suite (Tassa et al., 2018).
Dataset Splits	No	The paper describes training on demonstration trajectories and evaluation on 'newly generated instances' but does not explicitly mention a distinct validation split or set for hyperparameter tuning.
Hardware Specification	No	The paper mentions training on 'a single GPU' but does not provide specific hardware details such as the GPU model, CPU, or memory specifications.
Software Dependencies	No	The paper mentions the use of the Adam optimizer and the IMPALA algorithm, but does not specify versions for these or any other software dependencies like deep learning frameworks (e.g., TensorFlow, PyTorch).
Experiment Setup	Yes	Training is carried out on a single GPU with a ﬁxed learning rate of 10 4 using the Adam (Kingma & Ba, 2015) optimizer, with a batch size of 256 and for a total of 50k training iterations (500k for reacher task).