Multiplicative Interactions and Where to Find Them
Authors: Siddhant M. Jayakumar, Wojciech M. Czarnecki, Jacob Menick, Jonathan Schwarz, Jack Rae, Simon Osindero, Yee Whye Teh, Tim Harley, Razvan Pascanu
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we back up our claims and demonstrate the potential of multiplicative interactions by applying them in large-scale complex RL and sequence modelling tasks, where their use allows us to deliver state-of-the-art results |
| Researcher Affiliation | Industry | Deep Mind {sidmj, lejlot, jmenick, schwarzjn, jwrae, osindero, ywteh, tharley, razp}@google.com |
| Pseudocode | Yes | B SIMPLE IMPLEMENTATION OF MI LAYER |
| Open Source Code | No | The paper provides a 'simple code snippet' in the appendix, but does not state that the full source code for the methodology described in the paper is openly available or provide a link to a repository for it. |
| Open Datasets | Yes | multitask RL on the Deep Mind Lab-30 domain (Beattie et al., 2016). |
| Dataset Splits | No | No specific details on train/validation/test splits, such as percentages or sample counts, are provided for the datasets used beyond general references to standard benchmarks. |
| Hardware Specification | No | We train multi-task on 30 Deep Mind lab levels Beattie et al. (2016) concurrently using 5 actors per task and a multi-gpu learner with 4 GPUs. |
| Software Dependencies | No | We use Tensorflow and Sonnet Reynolds et al. (2017) for all our model implementations. |
| Experiment Setup | Yes | Models are trained using Adam optimiser for 6,000 steps using Mean Squared Error loss (MSE) on mini-batches of size 100 sampled from a standard Gaussian. We sweep over learning rates 0.1, 0.001, 0.0001 and pick the best result. |