Fast And Slow Learning Of Recurrent Independent Mechanisms

Authors: Kanika Madan, Nan Rosemary Ke, Anirudh Goyal, Bernhard Schölkopf, Yoshua Bengio

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct experiments in the domain of grounded language learning, in which poor data efficiency is one of the major limitations for agents to learn efficiently and generalize well (Hermann et al., 2017; Chaplot et al., 2017; Wu et al., 2018; Yu et al., 2018; Chevalier-Boisvert et al., 2018). We show, empirically, how the proposed learning agent can generalize better not only on the seen data, but also is more sample efficient, faster to train and adapt, and has better transfer capabilities in the face of changes in distributions.
Researcher Affiliation Academia 1 Mila, University of Montreal, 2 Mila, Polytechnique Montréal, 3 Max Planck Institute for Intelligent Systems.
Pseudocode No The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not provide an explicit statement about releasing source code for the described methodology, nor does it include any links to a code repository.
Open Datasets Yes The experiments are based on a large variety of environments from the Mini Grid and Baby AI suite (Chevalier-Boisvert et al., 2018) which provide an egocentric and partially observed view of the environment.
Dataset Splits No The paper does not provide specific percentages or counts for training, validation, or test dataset splits. It discusses training and evaluation across different environments rather than fixed data partitions.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU models, CPU types, memory) used for running the experiments.
Software Dependencies No The paper mentions using the Proximal Policy Optimization algorithm but does not specify any software dependencies with their version numbers (e.g., Python, PyTorch, TensorFlow versions or specific library versions).
Experiment Setup Yes For generalized advantage function, we used λ = 0.99, and discounted future rewards by a factor of γ = 0.99. For all of our environments, we used n = 5 total modules, with only k = 3 of them active at any given time.