Improving Intrinsic Exploration with Language Abstractions

Authors: Jesse Mu, Victor Zhong, Roberta Raileanu, Minqi Jiang, Noah Goodman, Tim Rocktäschel, Edward Grefenstette

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Across 13 challenging, procedurally-generated, sparse-reward tasks in the Mini Grid [8] and Mini Hack [41] environment suites, we show that language-parameterized exploration methods outperform their non-linguistic counterparts by 47 85%, especially in more abstract tasks with larger state and action spaces.
Researcher Affiliation Collaboration 1Stanford University, 2University of Washington, 3Meta AI, 4University College London, 5Cohere
Pseudocode Yes Algorithm S1 in Appendix A describes how L-AMIGo trains in an asynchronous actor-critic framework, where the student and teacher are jointly trained from batches of experience collected from separate actor threads, as used in our experiments (see Section 6).
Open Source Code Yes Code included with supplementary material and will be made public upon acceptance, with a link in Appendix C (currently anonymized)
Open Datasets Yes We evaluate on the most challenging tasks in Mini Grid [8]...To add language, we use the complementary Baby AI platform [9]...Mini Hack [41] is a suite of procedurally-generated tasks...
Dataset Splits No Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [Yes] In Appendices B, C, G
Hardware Specification Yes All experiments were run on a single NVIDIA A100 GPU for 7 days.
Software Dependencies No We evaluate L-AMIGo, AMIGo, L-Novel D, and Novel D, implemented in the Torch Beast [27] implementation of IMPALA [17], a common asynchronous actor-critic method.
Experiment Setup Yes for full model, training, and hyperparameter details, see Appendix C.