Multiplicative Interactions and Where to Find Them

Authors: Siddhant M. Jayakumar, Wojciech M. Czarnecki, Jacob Menick, Jonathan Schwarz, Jack Rae, Simon Osindero, Yee Whye Teh, Tim Harley, Razvan Pascanu

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we back up our claims and demonstrate the potential of multiplicative interactions by applying them in large-scale complex RL and sequence modelling tasks, where their use allows us to deliver state-of-the-art results
Researcher Affiliation Industry Deep Mind {sidmj, lejlot, jmenick, schwarzjn, jwrae, osindero, ywteh, tharley, razp}@google.com
Pseudocode Yes B SIMPLE IMPLEMENTATION OF MI LAYER
Open Source Code No The paper provides a 'simple code snippet' in the appendix, but does not state that the full source code for the methodology described in the paper is openly available or provide a link to a repository for it.
Open Datasets Yes multitask RL on the Deep Mind Lab-30 domain (Beattie et al., 2016).
Dataset Splits No No specific details on train/validation/test splits, such as percentages or sample counts, are provided for the datasets used beyond general references to standard benchmarks.
Hardware Specification No We train multi-task on 30 Deep Mind lab levels Beattie et al. (2016) concurrently using 5 actors per task and a multi-gpu learner with 4 GPUs.
Software Dependencies No We use Tensorflow and Sonnet Reynolds et al. (2017) for all our model implementations.
Experiment Setup Yes Models are trained using Adam optimiser for 6,000 steps using Mean Squared Error loss (MSE) on mini-batches of size 100 sampled from a standard Gaussian. We sweep over learning rates 0.1, 0.001, 0.0001 and pick the best result.