Metacontrol for Adaptive Imagination-Based Optimization

Authors: Jessica B. Hamrick, Andrew J. Ballard, Razvan Pascanu, Oriol Vinyals, Nicolas Heess, Peter W. Battaglia

ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To evaluate our metacontroller agent, we measured its ability to learn to solve a class of physicsbased tasks that are surprisingly challenging. Each episode consisted of a scene which contained a spaceship and multiple planets (Figure 1b-c). We trained the reactive, iterative, and metacontroller agents on five versions of the spaceship task involving different numbers of planets.
Researcher Affiliation Collaboration Jessica B. Hamrick UC Berkeley & Deep Mind jhamrick@berkeley.edu Andrew J. Ballard Deep Mind aybd@google.com Razvan Pascanu Deep Mind razp@google.com Oriol Vinyals Deep Mind vinyals@google.com Nicolas Heess Deep Mind heess@google.com Peter W. Battaglia Deep Mind peterbattaglia@google.com
Pseudocode Yes for an algorithmic illustration of the metacontroller agent, see Algorithm 1 in the appendix.
Open Source Code No Available from: https://www.github.com/deepmind/spaceship dataset
Open Datasets Yes We generated five datasets, each containing scenes with a different number of planets (ranging from a single planet to five planets). Each dataset consisted of 100,000 training scenes and 1,000 testing scenes. Available from: https://www.github.com/deepmind/spaceship dataset
Dataset Splits No Each dataset consisted of 100,000 training scenes and 1,000 testing scenes.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies No We used Tensor Flow (Abadi et al., 2015) to implement and train all versions of the model.
Experiment Setup Yes All weights were initialized uniformly at random between 0 and 0.01. An iteration of training consisted of gradient updates over a minibatch of size 1000; in total, we ran training for 100,000 iterations. We additionally used a waterfall schedule for each of the learning rates during training, such that after 1000 iterations, if the loss was not decreasing, we would decay the step size by 5%. We trained the controller and memory together using the Adam optimizer (Kingma & Ba, 2014) with gradients clipped to a maximum global norm of 10 (Pascanu et al., 2013). Learning rates were determined using a grid search over a small number of values, and are given in Table 1 for the iterative agent, in Table 2 for the metacontroller with one expert, and in Table 3 for the metacontroller with two experts.