Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Improving Intrinsic Exploration with Language Abstractions
Authors: Jesse Mu, Victor Zhong, Roberta Raileanu, Minqi Jiang, Noah Goodman, Tim Rocktäschel, Edward Grefenstette
NeurIPS 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Across 13 challenging, procedurally-generated, sparse-reward tasks in the Mini Grid [8] and Mini Hack [41] environment suites, we show that language-parameterized exploration methods outperform their non-linguistic counterparts by 47 85%, especially in more abstract tasks with larger state and action spaces. |
| Researcher Affiliation | Collaboration | 1Stanford University, 2University of Washington, 3Meta AI, 4University College London, 5Cohere |
| Pseudocode | Yes | Algorithm S1 in Appendix A describes how L-AMIGo trains in an asynchronous actor-critic framework, where the student and teacher are jointly trained from batches of experience collected from separate actor threads, as used in our experiments (see Section 6). |
| Open Source Code | Yes | Code included with supplementary material and will be made public upon acceptance, with a link in Appendix C (currently anonymized) |
| Open Datasets | Yes | We evaluate on the most challenging tasks in Mini Grid [8]...To add language, we use the complementary Baby AI platform [9]...Mini Hack [41] is a suite of procedurally-generated tasks... |
| Dataset Splits | No | Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [Yes] In Appendices B, C, G |
| Hardware Specification | Yes | All experiments were run on a single NVIDIA A100 GPU for 7 days. |
| Software Dependencies | No | We evaluate L-AMIGo, AMIGo, L-Novel D, and Novel D, implemented in the Torch Beast [27] implementation of IMPALA [17], a common asynchronous actor-critic method. |
| Experiment Setup | Yes | for full model, training, and hyperparameter details, see Appendix C. |