Learning About Progress From Experts
Authors: Jake Bruce, Ankit Anand, Bogdan Mazoure, Rob Fergus
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this work, we demonstrate that by learning a model of long-term progress from expert data containing only observations, we can achieve efficient exploration in challenging sparse tasks, well beyond what is possible with current state-of-the-art approaches. We evaluate our approach on three standard tasks: Score, Scout, and Oracle (K uttler et al., 2020), as well as four new sparse tasks that represent important subtasks in the full Net Hack game. |
| Researcher Affiliation | Collaboration | Jake Bruce Deep Mind Ankit Anand Deep Mind Bogdan Mazoure Mc Gill University Rob Fergus Deep Mind |
| Pseudocode | Yes | Algorithm 1 provides a pseudocode description of the algorithm. |
| Open Source Code | No | We have made the curated gameplay dataset used in this work available at https://github.com/deepmind/nao_top10. This link points to the dataset, not the source code for the proposed method. The reproducibility statement mentions detailing hyperparameters and implementation details, but not releasing the code. |
| Open Datasets | Yes | We have made the curated gameplay dataset used in this work available at https://github.com/deepmind/nao_top10. |
| Dataset Splits | No | The paper does not explicitly specify exact train/validation/test dataset splits for their main experiments. |
| Hardware Specification | Yes | Each experiment was run on 8 TPUv3 accelerators using a podracer configuration (Hessel et al., 2021b). |
| Software Dependencies | No | The paper mentions using the 'Jax ecosystem' but does not provide specific version numbers for Jax or any other software libraries used in the implementation. |
| Experiment Setup | Yes | Hyperparameters for all experiments are shown in Table 2. Where hyperparameters differ between approaches, the differences are shown in Table 3. |