Recurrent Independent Mechanisms

Authors: Anirudh Goyal, Alex Lamb, Jordan Hoffmann, Shagun Sodhani, Sergey Levine, Yoshua Bengio, Bernhard Schölkopf

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show that this leads to specialization amongst the RIMs, which in turn allows for remarkably improved generalization on tasks where some factors of variation differ systematically between training and evaluation. and The main goal of our experiments is to show that the use of RIMs improves generalization across changing environments and/or in modular tasks, and to explore how it does so. Our goal is not to outperform highly optimized baselines; rather, we want to show the versatility of our approach by applying it to a range of diverse tasks, focusing on tasks that involve a changing environment. We organize our results by the capabilities they illustrate: we address generalization based on temporal patterns, based on objects, and finally consider settings where both of these occur together.
Researcher Affiliation Academia 1 Mila, University of Montreal,2 Harvard University, 3 MPI for Intelligent Systems, Tübingen, 4 University of California, Berkeley
Pseudocode Yes G PSEUDOCODE FOR RIMS ALGORITHM
Open Source Code No No explicit statement about releasing code for the described methodology or a link to a code repository was found.
Open Datasets Yes Sequential MNIST Resolution Task, Bouncing Balls Environment: We use the bouncing-ball dataset from (Van Steenkiste et al., 2018)., object-picking reinforcement learning task from Baby AI (Chevalier-Boisvert et al., 2018), experiment on the whole suite of Atari games, We use the Stochastic Moving MNIST (SM-MNIST) (Denton & Fergus, 2018) dataset, WMT machine translation dataset and evaluated on the IWSLT14 dataset (English to German).
Dataset Splits Yes Copying Task: First we turn our attention to the task of receiving a short sequence of characters, then receiving blank inputs for a large number of steps, and then being asked to reproduce the original sequence. ... we can extend the length of this dormant phase from 50 during training to 200 during testing and retain perfect performance (Table 1). We train all models on a training dataset of 20K video sequences... We also include an additional 1K video sequences... as a held-out validation set.
Hardware Specification No No specific hardware details (like GPU/CPU models, memory, or cloud instance types) were provided for the experimental setup.
Software Dependencies No Optimizer Adam(Kingma & Ba, 2014) and Pytorch implementations of reinforcement learning algorithms. https://github. com/ikostrikov/pytorch-a2c-ppo-acktr-gail, 2018. - It mentions Adam and a PyTorch implementation, but specific versions of PyTorch or other libraries are not given.
Experiment Setup Yes Table 3 lists the different hyperparameters. Parameter Value Optimizer Adam(Kingma & Ba, 2014) learning rate 7 10 4 batch size 64 Input keys 64 Input Values Size of individual RIM * 4 Input Heads 4 Input Dropout 0.1 Communication keys 32 Communication Values 32 Communication heads 4 Communication Dropout 0.1. It also specifies trained each model for 150 epochs.