Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Metacontrol for Adaptive Imagination-Based Optimization
Authors: Jessica B. Hamrick, Andrew J. Ballard, Razvan Pascanu, Oriol Vinyals, Nicolas Heess, Peter W. Battaglia
ICLR 2017 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To evaluate our metacontroller agent, we measured its ability to learn to solve a class of physicsbased tasks that are surprisingly challenging. Each episode consisted of a scene which contained a spaceship and multiple planets (Figure 1b-c). We trained the reactive, iterative, and metacontroller agents on five versions of the spaceship task involving different numbers of planets. |
| Researcher Affiliation | Collaboration | Jessica B. Hamrick UC Berkeley & Deep Mind EMAIL Andrew J. Ballard Deep Mind EMAIL Razvan Pascanu Deep Mind EMAIL Oriol Vinyals Deep Mind EMAIL Nicolas Heess Deep Mind EMAIL Peter W. Battaglia Deep Mind EMAIL |
| Pseudocode | Yes | for an algorithmic illustration of the metacontroller agent, see Algorithm 1 in the appendix. |
| Open Source Code | No | Available from: https://www.github.com/deepmind/spaceship dataset |
| Open Datasets | Yes | We generated five datasets, each containing scenes with a different number of planets (ranging from a single planet to five planets). Each dataset consisted of 100,000 training scenes and 1,000 testing scenes. Available from: https://www.github.com/deepmind/spaceship dataset |
| Dataset Splits | No | Each dataset consisted of 100,000 training scenes and 1,000 testing scenes. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. |
| Software Dependencies | No | We used Tensor Flow (Abadi et al., 2015) to implement and train all versions of the model. |
| Experiment Setup | Yes | All weights were initialized uniformly at random between 0 and 0.01. An iteration of training consisted of gradient updates over a minibatch of size 1000; in total, we ran training for 100,000 iterations. We additionally used a waterfall schedule for each of the learning rates during training, such that after 1000 iterations, if the loss was not decreasing, we would decay the step size by 5%. We trained the controller and memory together using the Adam optimizer (Kingma & Ba, 2014) with gradients clipped to a maximum global norm of 10 (Pascanu et al., 2013). Learning rates were determined using a grid search over a small number of values, and are given in Table 1 for the iterative agent, in Table 2 for the metacontroller with one expert, and in Table 3 for the metacontroller with two experts. |