Metacontrol for Adaptive Imagination-Based Optimization
Authors: Jessica B. Hamrick, Andrew J. Ballard, Razvan Pascanu, Oriol Vinyals, Nicolas Heess, Peter W. Battaglia
ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To evaluate our metacontroller agent, we measured its ability to learn to solve a class of physicsbased tasks that are surprisingly challenging. Each episode consisted of a scene which contained a spaceship and multiple planets (Figure 1b-c). We trained the reactive, iterative, and metacontroller agents on five versions of the spaceship task involving different numbers of planets. |
| Researcher Affiliation | Collaboration | Jessica B. Hamrick UC Berkeley & Deep Mind jhamrick@berkeley.edu Andrew J. Ballard Deep Mind aybd@google.com Razvan Pascanu Deep Mind razp@google.com Oriol Vinyals Deep Mind vinyals@google.com Nicolas Heess Deep Mind heess@google.com Peter W. Battaglia Deep Mind peterbattaglia@google.com |
| Pseudocode | Yes | for an algorithmic illustration of the metacontroller agent, see Algorithm 1 in the appendix. |
| Open Source Code | No | Available from: https://www.github.com/deepmind/spaceship dataset |
| Open Datasets | Yes | We generated five datasets, each containing scenes with a different number of planets (ranging from a single planet to five planets). Each dataset consisted of 100,000 training scenes and 1,000 testing scenes. Available from: https://www.github.com/deepmind/spaceship dataset |
| Dataset Splits | No | Each dataset consisted of 100,000 training scenes and 1,000 testing scenes. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. |
| Software Dependencies | No | We used Tensor Flow (Abadi et al., 2015) to implement and train all versions of the model. |
| Experiment Setup | Yes | All weights were initialized uniformly at random between 0 and 0.01. An iteration of training consisted of gradient updates over a minibatch of size 1000; in total, we ran training for 100,000 iterations. We additionally used a waterfall schedule for each of the learning rates during training, such that after 1000 iterations, if the loss was not decreasing, we would decay the step size by 5%. We trained the controller and memory together using the Adam optimizer (Kingma & Ba, 2014) with gradients clipped to a maximum global norm of 10 (Pascanu et al., 2013). Learning rates were determined using a grid search over a small number of values, and are given in Table 1 for the iterative agent, in Table 2 for the metacontroller with one expert, and in Table 3 for the metacontroller with two experts. |