Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Exploration in Approximate Hyper-State Space for Meta Reinforcement Learning
Authors: Luisa M Zintgraf, Leo Feng, Cong Lu, Maximilian Igl, Kristian Hartikainen, Katja Hofmann, Shimon Whiteson
ICML 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present four experiments that illustrate how and why Hyper X helps agents meta-learn good online adaptation strategies (Sec 5.1-5.3), and results on sparse Mu Jo Co Ant Goal to demonstrate that Hyper X scales well (Sec 5.4). |
| Researcher Affiliation | Collaboration | 1University of Oxford, UK. 2Mila, Universit e de Montr eal, Canada. 3Microsoft Research, Cambridge, UK. |
| Pseudocode | Yes | Algorithm 1 Hyper X Pseudo-Code |
| Open Source Code | Yes | Our source code can be found at https://github. com/lmzintgraf/hyperx. |
| Open Datasets | No | The paper describes the use of standard RL environments and custom-designed tasks (e.g., Treasure Mountain, Multi-Stage Gridworld, sparse Half Cheetah Dir, sparse Mu Jo Co Ant Goal) but does not provide specific links, DOIs, or citations with author/year for public access to the *datasets* or *sampled task distributions* used in their experiments. It describes how these environments are configured or modified rather than providing access to data. |
| Dataset Splits | No | The paper operates in a reinforcement learning setting, describing meta-training on a task distribution and evaluation on new tasks from that distribution. It does not provide explicit train/validation/test dataset splits (e.g., percentages or sample counts) as commonly seen in supervised learning contexts. |
| Hardware Specification | No | The acknowledgements mention 'computing resources provided by Compute Canada' and 'a generous equipment grant from NVIDIA', but no specific hardware details such as exact GPU/CPU models, processor types, or memory amounts are provided. |
| Software Dependencies | No | The paper provides a link to its source code but does not explicitly list software dependencies with version numbers (e.g., 'Python 3.8, PyTorch 1.9') in the main text. |
| Experiment Setup | No | The paper states 'Implementation details are given in Appendix C.' and refers to a table of hyperparameters in the appendix. However, these specific experimental setup details (e.g., hyperparameter values, training configurations) are not provided in the main body of the paper. |