Exploring through Random Curiosity with General Value Functions
Authors: Aditya Ramesh, Louis Kirsch, Sjoerd van Steenkiste, Jürgen Schmidhuber
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate that this improves exploration in a hard-exploration diabolical lock problem. Furthermore, RC-GVF significantly outperforms previous methods in the absence of ground-truth episodic counts in the partially observable Mini Grid environments. |
| Researcher Affiliation | Collaboration | 1The Swiss AI Lab (IDSIA), University of Lugano (USI) & SUPSI 2Google Research 3AI Initiative, King Abdullah University of Science and Technology (KAUST) 4NNAISENSE |
| Pseudocode | No | The paper describes its method in text and mathematical equations but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at https://github.com/Aditya-Ramesh-10/exploring-through-rcgvf. |
| Open Datasets | Yes | We evaluate RC-GVF on procedurally generated environments from Mini Grid [13], which is a standard benchmark in the deep reinforcement learning literature for exploration [46, 11, 18, 77, 76, 42]. with citation [13] M. Chevalier-Boisvert, L. Willems, and S. Pal. Minimalistic gridworld environment for openai gym. https://github.com/maximecb/gym-minigrid, 2018. |
| Dataset Splits | No | The paper states that implementation details and hyperparameters are in Appendices B and C, but the provided text does not explicitly detail train/validation/test dataset splits with percentages or counts. |
| Hardware Specification | No | The paper mentions support from 'The Swiss National Supercomputing Centre (CSCS projects s1090 and s1127)' and refers to Appendix C for details, but does not provide specific hardware details (like GPU/CPU models or memory amounts) in the main text. |
| Software Dependencies | No | The paper mentions software components like 'Proximal Policy Optimization (PPO)' and 'Adam' but does not provide specific version numbers for these or other ancillary software dependencies. |
| Experiment Setup | Yes | For RC-GVF we set γz = 0.6 and use two prediction heads in the ensemble... and use 128 pseudo-rewards... Other important hyper-parameters, such as the intrinsic reward coefficient (β), entropy coefficient, and learning rate of the predictor are obtained via an extensive hyper-parameter search for all baselines (see Appendices C.2 and C.3 for details, including on our implementation of baselines.). |