Meta-learning curiosity algorithms
Authors: Ferran Alet*, Martin F. Schneider*, Tomas Lozano-Perez, Leslie Pack Kaelbling
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the effectiveness of the approach empirically, finding two novel curiosity algorithms that perform on par or better than human-designed published curiosity algorithms in domains as disparate as grid navigation with image inputs, acrobot, lunar lander, ant and hopper. |
| Researcher Affiliation | Academia | Ferran Alet , Martin F. Schneider , Tom as Lozano-P erez & Leslie Pack Kaelbling Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Cambridge, MA 02139, USA {alet,martinfs,tlp,lpk}@mit.edu |
| Pseudocode | No | The paper uses diagrams of computational graphs (e.g., Figure 2, 3, 5, 6, 10, 11) to represent algorithms and describes a domain-specific language, but it does not provide traditional pseudocode blocks or algorithm listings. |
| Open Source Code | Yes | Our code (https://github.com/mfranzs/ meta-learning-curiosity-algorithms) can take in any Open AI gym environment (Brockman et al., 2016) with a specification of the desired exploration horizon T. |
| Open Datasets | Yes | Our code (https://github.com/mfranzs/ meta-learning-curiosity-algorithms) can take in any Open AI gym environment (Brockman et al., 2016) with a specification of the desired exploration horizon T. and We start by searching for a good intrinsic curiosity program I in a purely exploratory environment, designed by Chevalier-Boisvert et al. (2018), which is an image-based grid world... and Finally, we evaluate on two Mu Jo Co environments (Todorov et al., 2012): hopper and ant. |
| Dataset Splits | No | The paper describes training on various RL environments and a pruning strategy for algorithms, but it does not specify explicit training/validation/test dataset splits with percentages or sample counts for any specific dataset. |
| Hardware Specification | Yes | We then randomly split the programs across 4 machines, each with 8 Nvidia Tesla K80 GPUs for 10 hours; thus a total of 13 GPU days. |
| Software Dependencies | No | Our RL agent uses PPO (Schulman et al., 2017) based on the implementation by Kostrikov (2018) in Py Torch (Paszke et al., 2017). The paper mentions PyTorch but does not provide a specific version number, nor does it list other key software dependencies with their versions. |
| Experiment Setup | Yes | As is common in PPO, we run multiple rollouts (5, except for Mu Jo Co which only has 1), with independent experiences but shared policy and curiosity modules. and To bias towards simple, interpretable algorithms and keep the search space manageable, we search for programs with at most 7 operations. and We use a 10-nearest-neighbor regressor, with k = 10. |