Exploration in Approximate Hyper-State Space for Meta Reinforcement Learning
Authors: Luisa M Zintgraf, Leo Feng, Cong Lu, Maximilian Igl, Kristian Hartikainen, Katja Hofmann, Shimon Whiteson
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present four experiments that illustrate how and why Hyper X helps agents meta-learn good online adaptation strategies (Sec 5.1-5.3), and results on sparse Mu Jo Co Ant Goal to demonstrate that Hyper X scales well (Sec 5.4). |
| Researcher Affiliation | Collaboration | 1University of Oxford, UK. 2Mila, Universit e de Montr eal, Canada. 3Microsoft Research, Cambridge, UK. |
| Pseudocode | Yes | Algorithm 1 Hyper X Pseudo-Code |
| Open Source Code | Yes | Our source code can be found at https://github. com/lmzintgraf/hyperx. |
| Open Datasets | No | The paper describes the use of standard RL environments and custom-designed tasks (e.g., Treasure Mountain, Multi-Stage Gridworld, sparse Half Cheetah Dir, sparse Mu Jo Co Ant Goal) but does not provide specific links, DOIs, or citations with author/year for public access to the *datasets* or *sampled task distributions* used in their experiments. It describes how these environments are configured or modified rather than providing access to data. |
| Dataset Splits | No | The paper operates in a reinforcement learning setting, describing meta-training on a task distribution and evaluation on new tasks from that distribution. It does not provide explicit train/validation/test dataset splits (e.g., percentages or sample counts) as commonly seen in supervised learning contexts. |
| Hardware Specification | No | The acknowledgements mention 'computing resources provided by Compute Canada' and 'a generous equipment grant from NVIDIA', but no specific hardware details such as exact GPU/CPU models, processor types, or memory amounts are provided. |
| Software Dependencies | No | The paper provides a link to its source code but does not explicitly list software dependencies with version numbers (e.g., 'Python 3.8, PyTorch 1.9') in the main text. |
| Experiment Setup | No | The paper states 'Implementation details are given in Appendix C.' and refers to a table of hyperparameters in the appendix. However, these specific experimental setup details (e.g., hyperparameter values, training configurations) are not provided in the main body of the paper. |