reproducibilityindex.ai

Meta-learning curiosity algorithms

Authors: Ferran Alet*, Martin F. Schneider*, Tomas Lozano-Perez, Leslie Pack Kaelbling

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the effectiveness of the approach empirically, ﬁnding two novel curiosity algorithms that perform on par or better than human-designed published curiosity algorithms in domains as disparate as grid navigation with image inputs, acrobot, lunar lander, ant and hopper.
Researcher Affiliation	Academia	Ferran Alet , Martin F. Schneider , Tom as Lozano-P erez & Leslie Pack Kaelbling Computer Science and Artiﬁcial Intelligence Laboratory Massachusetts Institute of Technology Cambridge, MA 02139, USA {alet,martinfs,tlp,lpk}@mit.edu
Pseudocode	No	The paper uses diagrams of computational graphs (e.g., Figure 2, 3, 5, 6, 10, 11) to represent algorithms and describes a domain-specific language, but it does not provide traditional pseudocode blocks or algorithm listings.
Open Source Code	Yes	Our code (https://github.com/mfranzs/ meta-learning-curiosity-algorithms) can take in any Open AI gym environment (Brockman et al., 2016) with a speciﬁcation of the desired exploration horizon T.
Open Datasets	Yes	Our code (https://github.com/mfranzs/ meta-learning-curiosity-algorithms) can take in any Open AI gym environment (Brockman et al., 2016) with a speciﬁcation of the desired exploration horizon T. and We start by searching for a good intrinsic curiosity program I in a purely exploratory environment, designed by Chevalier-Boisvert et al. (2018), which is an image-based grid world... and Finally, we evaluate on two Mu Jo Co environments (Todorov et al., 2012): hopper and ant.
Dataset Splits	No	The paper describes training on various RL environments and a pruning strategy for algorithms, but it does not specify explicit training/validation/test dataset splits with percentages or sample counts for any specific dataset.
Hardware Specification	Yes	We then randomly split the programs across 4 machines, each with 8 Nvidia Tesla K80 GPUs for 10 hours; thus a total of 13 GPU days.
Software Dependencies	No	Our RL agent uses PPO (Schulman et al., 2017) based on the implementation by Kostrikov (2018) in Py Torch (Paszke et al., 2017). The paper mentions PyTorch but does not provide a specific version number, nor does it list other key software dependencies with their versions.
Experiment Setup	Yes	As is common in PPO, we run multiple rollouts (5, except for Mu Jo Co which only has 1), with independent experiences but shared policy and curiosity modules. and To bias towards simple, interpretable algorithms and keep the search space manageable, we search for programs with at most 7 operations. and We use a 10-nearest-neighbor regressor, with k = 10.