Improving Intrinsic Exploration with Language Abstractions
Authors: Jesse Mu, Victor Zhong, Roberta Raileanu, Minqi Jiang, Noah Goodman, Tim Rocktäschel, Edward Grefenstette
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Across 13 challenging, procedurally-generated, sparse-reward tasks in the Mini Grid [8] and Mini Hack [41] environment suites, we show that language-parameterized exploration methods outperform their non-linguistic counterparts by 47 85%, especially in more abstract tasks with larger state and action spaces. |
| Researcher Affiliation | Collaboration | 1Stanford University, 2University of Washington, 3Meta AI, 4University College London, 5Cohere |
| Pseudocode | Yes | Algorithm S1 in Appendix A describes how L-AMIGo trains in an asynchronous actor-critic framework, where the student and teacher are jointly trained from batches of experience collected from separate actor threads, as used in our experiments (see Section 6). |
| Open Source Code | Yes | Code included with supplementary material and will be made public upon acceptance, with a link in Appendix C (currently anonymized) |
| Open Datasets | Yes | We evaluate on the most challenging tasks in Mini Grid [8]...To add language, we use the complementary Baby AI platform [9]...Mini Hack [41] is a suite of procedurally-generated tasks... |
| Dataset Splits | No | Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [Yes] In Appendices B, C, G |
| Hardware Specification | Yes | All experiments were run on a single NVIDIA A100 GPU for 7 days. |
| Software Dependencies | No | We evaluate L-AMIGo, AMIGo, L-Novel D, and Novel D, implemented in the Torch Beast [27] implementation of IMPALA [17], a common asynchronous actor-critic method. |
| Experiment Setup | Yes | for full model, training, and hyperparameter details, see Appendix C. |