Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Learning About Progress From Experts
Authors: Jake Bruce, Ankit Anand, Bogdan Mazoure, Rob Fergus
ICLR 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this work, we demonstrate that by learning a model of long-term progress from expert data containing only observations, we can achieve efficient exploration in challenging sparse tasks, well beyond what is possible with current state-of-the-art approaches. We evaluate our approach on three standard tasks: Score, Scout, and Oracle (K uttler et al., 2020), as well as four new sparse tasks that represent important subtasks in the full Net Hack game. |
| Researcher Affiliation | Collaboration | Jake Bruce Deep Mind Ankit Anand Deep Mind Bogdan Mazoure Mc Gill University Rob Fergus Deep Mind |
| Pseudocode | Yes | Algorithm 1 provides a pseudocode description of the algorithm. |
| Open Source Code | No | We have made the curated gameplay dataset used in this work available at https://github.com/deepmind/nao_top10. This link points to the dataset, not the source code for the proposed method. The reproducibility statement mentions detailing hyperparameters and implementation details, but not releasing the code. |
| Open Datasets | Yes | We have made the curated gameplay dataset used in this work available at https://github.com/deepmind/nao_top10. |
| Dataset Splits | No | The paper does not explicitly specify exact train/validation/test dataset splits for their main experiments. |
| Hardware Specification | Yes | Each experiment was run on 8 TPUv3 accelerators using a podracer configuration (Hessel et al., 2021b). |
| Software Dependencies | No | The paper mentions using the 'Jax ecosystem' but does not provide specific version numbers for Jax or any other software libraries used in the implementation. |
| Experiment Setup | Yes | Hyperparameters for all experiments are shown in Table 2. Where hyperparameters differ between approaches, the differences are shown in Table 3. |