reproducibilityindex.ai

Metacontrol for Adaptive Imagination-Based Optimization

Authors: Jessica B. Hamrick, Andrew J. Ballard, Razvan Pascanu, Oriol Vinyals, Nicolas Heess, Peter W. Battaglia

ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To evaluate our metacontroller agent, we measured its ability to learn to solve a class of physicsbased tasks that are surprisingly challenging. Each episode consisted of a scene which contained a spaceship and multiple planets (Figure 1b-c). We trained the reactive, iterative, and metacontroller agents on ﬁve versions of the spaceship task involving different numbers of planets.
Researcher Affiliation	Collaboration	Jessica B. Hamrick UC Berkeley & Deep Mind jhamrick@berkeley.edu Andrew J. Ballard Deep Mind aybd@google.com Razvan Pascanu Deep Mind razp@google.com Oriol Vinyals Deep Mind vinyals@google.com Nicolas Heess Deep Mind heess@google.com Peter W. Battaglia Deep Mind peterbattaglia@google.com
Pseudocode	Yes	for an algorithmic illustration of the metacontroller agent, see Algorithm 1 in the appendix.
Open Source Code	No	Available from: https://www.github.com/deepmind/spaceship dataset
Open Datasets	Yes	We generated ﬁve datasets, each containing scenes with a different number of planets (ranging from a single planet to ﬁve planets). Each dataset consisted of 100,000 training scenes and 1,000 testing scenes. Available from: https://www.github.com/deepmind/spaceship dataset
Dataset Splits	No	Each dataset consisted of 100,000 training scenes and 1,000 testing scenes.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies	No	We used Tensor Flow (Abadi et al., 2015) to implement and train all versions of the model.
Experiment Setup	Yes	All weights were initialized uniformly at random between 0 and 0.01. An iteration of training consisted of gradient updates over a minibatch of size 1000; in total, we ran training for 100,000 iterations. We additionally used a waterfall schedule for each of the learning rates during training, such that after 1000 iterations, if the loss was not decreasing, we would decay the step size by 5%. We trained the controller and memory together using the Adam optimizer (Kingma & Ba, 2014) with gradients clipped to a maximum global norm of 10 (Pascanu et al., 2013). Learning rates were determined using a grid search over a small number of values, and are given in Table 1 for the iterative agent, in Table 2 for the metacontroller with one expert, and in Table 3 for the metacontroller with two experts.