Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Exploring through Random Curiosity with General Value Functions
Authors: Aditya Ramesh, Louis Kirsch, Sjoerd van Steenkiste, Jürgen Schmidhuber
NeurIPS 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate that this improves exploration in a hard-exploration diabolical lock problem. Furthermore, RC-GVF significantly outperforms previous methods in the absence of ground-truth episodic counts in the partially observable Mini Grid environments. |
| Researcher Affiliation | Collaboration | 1The Swiss AI Lab (IDSIA), University of Lugano (USI) & SUPSI 2Google Research 3AI Initiative, King Abdullah University of Science and Technology (KAUST) 4NNAISENSE |
| Pseudocode | No | The paper describes its method in text and mathematical equations but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at https://github.com/Aditya-Ramesh-10/exploring-through-rcgvf. |
| Open Datasets | Yes | We evaluate RC-GVF on procedurally generated environments from Mini Grid [13], which is a standard benchmark in the deep reinforcement learning literature for exploration [46, 11, 18, 77, 76, 42]. with citation [13] M. Chevalier-Boisvert, L. Willems, and S. Pal. Minimalistic gridworld environment for openai gym. https://github.com/maximecb/gym-minigrid, 2018. |
| Dataset Splits | No | The paper states that implementation details and hyperparameters are in Appendices B and C, but the provided text does not explicitly detail train/validation/test dataset splits with percentages or counts. |
| Hardware Specification | No | The paper mentions support from 'The Swiss National Supercomputing Centre (CSCS projects s1090 and s1127)' and refers to Appendix C for details, but does not provide specific hardware details (like GPU/CPU models or memory amounts) in the main text. |
| Software Dependencies | No | The paper mentions software components like 'Proximal Policy Optimization (PPO)' and 'Adam' but does not provide specific version numbers for these or other ancillary software dependencies. |
| Experiment Setup | Yes | For RC-GVF we set γz = 0.6 and use two prediction heads in the ensemble... and use 128 pseudo-rewards... Other important hyper-parameters, such as the intrinsic reward coefficient (β), entropy coefficient, and learning rate of the predictor are obtained via an extensive hyper-parameter search for all baselines (see Appendices C.2 and C.3 for details, including on our implementation of baselines.). |