reproducibilityindex.ai

Automated curriculum generation through setter-solver interactions

Authors: Sebastien Racaniere, Andrew Lampinen, Adam Santoro, David Reichert, Vlad Firoiu, Timothy Lillicrap

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the success of our approach in rich but sparsely rewarding 2D and 3D environments, where an agent is tasked to achieve a single goal selected from a set of possible goals that varies between episodes, and identify challenges for future work.
Researcher Affiliation	Collaboration	Sébastien Racanière, Andrew K. Lampinen Equal Contributions Deep Mind sracaniere@google.com, lampinen@stanford.edu Adam Santoro, David P. Reichert, Vlad Firoiu, Timothy P. Lillicrap Deep Mind {adamsantoro,reichert,vladfi,countzero}@google.com
Pseudocode	Yes	Algorithm 1: Solver-Actor loop
Open Source Code	Yes	To help with reproducibility, we provide code for the networks used for the Setter: https://drive.google.com/drive/folders/1yjhzt Fe X67t HEIm XCi P UAQf Q-w Fv V4Y?usp=sharing.
Open Datasets	No	The paper uses custom-built environments ("3D color ﬁnding: A semi-realistic 3D environment built in Unity (http://unity3d.com)" and "Grid-world alchemy: A 2D grid world environment...") rather than publicly available datasets, and does not provide specific access information for generated data.
Dataset Splits	No	The paper describes dynamic, procedurally generated environments and mentions training and testing, but does not provide specific train/validation/test dataset splits (percentages, counts, or citations to predefined splits) for reproducibility.
Hardware Specification	No	The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments, only mentioning a "distributed learning setup".
Software Dependencies	No	The paper mentions software like the IMPALA framework, RMSProp, Adam, and Unity, but does not specify their version numbers or other software dependencies required for replication (e.g., Python, TensorFlow, PyTorch versions).
Experiment Setup	Yes	The solver agents were trained using the framework of Espeholt et al. (2018), with the RMSProp optimizer, without momentum and a learning rate of 2 10 4. The setters were trained using Adam, with learning rates of 2 10 4 on the 3D tasks and 3 10 4 on the grid-world alchemy tasks. [...] We found it was useful to down-weight the vision information by ﬁxed constants before inputting it to the setter and the judge [...] These constants were determined via a hyperparameter sweep, and were 0.1 for the setter in all conditioned tasks, and 10 7 and 10 6 respectively for the judge in the alchemy tasks and recolored color-ﬁnding tasks. [...] We found βdes. = 5 to be optimal, though results in ﬁg. 7b are from runs with βdes. = 1.