Exploring through Random Curiosity with General Value Functions

Authors: Aditya Ramesh, Louis Kirsch, Sjoerd van Steenkiste, Jürgen Schmidhuber

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate that this improves exploration in a hard-exploration diabolical lock problem. Furthermore, RC-GVF significantly outperforms previous methods in the absence of ground-truth episodic counts in the partially observable Mini Grid environments.
Researcher Affiliation Collaboration 1The Swiss AI Lab (IDSIA), University of Lugano (USI) & SUPSI 2Google Research 3AI Initiative, King Abdullah University of Science and Technology (KAUST) 4NNAISENSE
Pseudocode No The paper describes its method in text and mathematical equations but does not include structured pseudocode or algorithm blocks.
Open Source Code Yes Code is available at https://github.com/Aditya-Ramesh-10/exploring-through-rcgvf.
Open Datasets Yes We evaluate RC-GVF on procedurally generated environments from Mini Grid [13], which is a standard benchmark in the deep reinforcement learning literature for exploration [46, 11, 18, 77, 76, 42]. with citation [13] M. Chevalier-Boisvert, L. Willems, and S. Pal. Minimalistic gridworld environment for openai gym. https://github.com/maximecb/gym-minigrid, 2018.
Dataset Splits No The paper states that implementation details and hyperparameters are in Appendices B and C, but the provided text does not explicitly detail train/validation/test dataset splits with percentages or counts.
Hardware Specification No The paper mentions support from 'The Swiss National Supercomputing Centre (CSCS projects s1090 and s1127)' and refers to Appendix C for details, but does not provide specific hardware details (like GPU/CPU models or memory amounts) in the main text.
Software Dependencies No The paper mentions software components like 'Proximal Policy Optimization (PPO)' and 'Adam' but does not provide specific version numbers for these or other ancillary software dependencies.
Experiment Setup Yes For RC-GVF we set γz = 0.6 and use two prediction heads in the ensemble... and use 128 pseudo-rewards... Other important hyper-parameters, such as the intrinsic reward coefficient (β), entropy coefficient, and learning rate of the predictor are obtained via an extensive hyper-parameter search for all baselines (see Appendices C.2 and C.3 for details, including on our implementation of baselines.).