Automated curriculum generation through setter-solver interactions

Authors: Sebastien Racaniere, Andrew Lampinen, Adam Santoro, David Reichert, Vlad Firoiu, Timothy Lillicrap

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the success of our approach in rich but sparsely rewarding 2D and 3D environments, where an agent is tasked to achieve a single goal selected from a set of possible goals that varies between episodes, and identify challenges for future work.
Researcher Affiliation Collaboration Sébastien Racanière, Andrew K. Lampinen Equal Contributions Deep Mind sracaniere@google.com, lampinen@stanford.edu Adam Santoro, David P. Reichert, Vlad Firoiu, Timothy P. Lillicrap Deep Mind {adamsantoro,reichert,vladfi,countzero}@google.com
Pseudocode Yes Algorithm 1: Solver-Actor loop
Open Source Code Yes To help with reproducibility, we provide code for the networks used for the Setter: https://drive.google.com/drive/folders/1yjhzt Fe X67t HEIm XCi P UAQf Q-w Fv V4Y?usp=sharing.
Open Datasets No The paper uses custom-built environments ("3D color finding: A semi-realistic 3D environment built in Unity (http://unity3d.com)" and "Grid-world alchemy: A 2D grid world environment...") rather than publicly available datasets, and does not provide specific access information for generated data.
Dataset Splits No The paper describes dynamic, procedurally generated environments and mentions training and testing, but does not provide specific train/validation/test dataset splits (percentages, counts, or citations to predefined splits) for reproducibility.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments, only mentioning a "distributed learning setup".
Software Dependencies No The paper mentions software like the IMPALA framework, RMSProp, Adam, and Unity, but does not specify their version numbers or other software dependencies required for replication (e.g., Python, TensorFlow, PyTorch versions).
Experiment Setup Yes The solver agents were trained using the framework of Espeholt et al. (2018), with the RMSProp optimizer, without momentum and a learning rate of 2 10 4. The setters were trained using Adam, with learning rates of 2 10 4 on the 3D tasks and 3 10 4 on the grid-world alchemy tasks. [...] We found it was useful to down-weight the vision information by fixed constants before inputting it to the setter and the judge [...] These constants were determined via a hyperparameter sweep, and were 0.1 for the setter in all conditioned tasks, and 10 7 and 10 6 respectively for the judge in the alchemy tasks and recolored color-finding tasks. [...] We found βdes. = 5 to be optimal, though results in fig. 7b are from runs with βdes. = 1.