Learning with AMIGo: Adversarially Motivated Intrinsic Goals

Authors: Andres Campero, Roberta Raileanu, Heinrich Kuttler, Joshua B. Tenenbaum, Tim Rocktäschel, Edward Grefenstette

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental we show, through 114 experiments on 6 challenging exploration tasks in procedurally generated environments, that agents trained with AMIGO gradually learn to interact with the environment and solve tasks which are too difficult for state-of-the-art methods, and (iii) we perform an extensive qualitative analysis and ablation study.
Researcher Affiliation Collaboration Andres Campero Brain and Cognitive Sciences, MIT Cambridge, USA campero@mit.edu; Roberta Raileanu New York University New York, USA raileanu@cs.nyu.edu; Heinrich K uttler Facebook AI Research London, UK hnr@fb.com; Joshua B. Tenenbaum Brain and Cognitive Sciences, MIT Cambridge, USA jbt@mit.edu; Tim Rockt aschel University College London & Facebook AI Research London, UK rockt@fb.com; Edward Grefenstette University College London & Facebook AI Research London, UK egrefen@fb.com
Pseudocode No The paper describes the methods and training procedures in natural language and mathematical equations but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes The code for these experiments is included in the supplementary materials, and has also been released under https://anonymous to facilitate reproduction of our method and its use in other projects.
Open Datasets Yes Concretely, we use Mini Grid (Chevalier-Boisvert et al., 2018), a suite of fast-to-run procedurally-generated environments with a symbolic/discrete... Maxime Chevalier-Boisvert, Lucas Willems, and Suman Pal. Minimalistic gridworld environment for Open AI gym. https://github.com/maximecb/gym-minigrid, 2018.
Dataset Splits No The environment is procedurally-generated, meaning the layout of the environment changes at every episode. The paper does not specify fixed dataset splits (e.g., percentages or counts) for training, validation, and testing as it operates on procedurally generated environments.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU, GPU models, or memory specifications) used for running the experiments. It only mentions the software platform 'Torch Beast'.
Software Dependencies No The paper mentions using 'Torch Beast' and that it is a 'Py Torch platform', but does not provide specific version numbers for PyTorch or other software dependencies.
Experiment Setup Yes The hyperparameters for the reward for the teacher r T are grid searched, and optimal values are found at α = .7 and β = .3 (see Appendix B for full hyperparameter search details). The best hyperparameters for AMIGO... are reported below: AMIGO: a student batch size of 8, a teacher batch size of 150, a student learning rate of .001, a teacher learning rate of .001, an unroll length of 100, a student entropy cost of .0005, a teacher entropy cost of .01, and observation embedding dimension of 5, a student last layer embedding dimension of 256, and finally, α = 0.7 and β = 0.3.