reproducibilityindex.ai

InfoBot: Transfer and Exploration via the Information Bottleneck

Authors: Anirudh Goyal, Riashat Islam, DJ Strouse, Zafarali Ahmed, Hugo Larochelle, Matthew Botvinick, Yoshua Bengio, Sergey Levine

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we demonstrate the following experimentally: The goal-conditioned policy with information bottleneck leads to much better policy transfer than standard RL training procedures (direct policy transfer). Using decision states as an exploration bonus leads to better performance than a variety of standard task-agnostic exploration methods (transferable exploration strategies).
Researcher Affiliation	Collaboration	1 Mila, University of Montreal,2 Mila, Mc Gill University, 3 Deepmind, 4 Google Brain, 5 University College London, 6 University of California, Berkeley.
Pseudocode	Yes	Algorithm 1 Transfer and Exploration via the Information Bottleneck
Open Source Code	Yes	For reproducibility purposes of our experiments, we will further release the code on github that will be available on https://github.com/anonymous
Open Datasets	Yes	The first set of environments we consider are partially observable grid worlds generated with Mini Grid (Chevalier-Boisvert and Willems, 2018), an Open AI Gym package (Brockman et al., 2016).
Dataset Splits	No	The paper mentions training on smaller versions of environments and evaluating on larger versions for generalization, but does not specify traditional train/validation/test dataset splits with percentages or counts.
Hardware Specification	No	The paper mentions “Compute Canada for computing resources” but does not specify any particular hardware (GPU/CPU models, memory, etc.) used for the experiments.
Software Dependencies	No	The paper mentions using “gym-minigrid”, “Open AI Gym”, and “A2C implementation from (Chevalier-Boisvert and Willems, 2018)”, and “open-source A2C implementation from Kostrikov (2018)” and “PPO... with the open-source implementation available in Kostrikov (2018)”. However, it does not provide specific version numbers for these libraries or other software dependencies like Python, PyTorch/TensorFlow, or CUDA.
Experiment Setup	Yes	For the maze environments, we use A2C with 48 parallel workers. Our actor network and critic networks consist of two and three fully connected layers respectively, each of which have 128 hidden units. The encoder network is also parameterized as a neural network, which consists of 1 fully connected layer. We use RMSProp with an initial learning rate of 0.0007 to train the models, for both Info Bot and the baseline for a fair comparison.