InfoBot: Transfer and Exploration via the Information Bottleneck
Authors: Anirudh Goyal, Riashat Islam, DJ Strouse, Zafarali Ahmed, Hugo Larochelle, Matthew Botvinick, Yoshua Bengio, Sergey Levine
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we demonstrate the following experimentally: The goal-conditioned policy with information bottleneck leads to much better policy transfer than standard RL training procedures (direct policy transfer). Using decision states as an exploration bonus leads to better performance than a variety of standard task-agnostic exploration methods (transferable exploration strategies). |
| Researcher Affiliation | Collaboration | 1 Mila, University of Montreal,2 Mila, Mc Gill University, 3 Deepmind, 4 Google Brain, 5 University College London, 6 University of California, Berkeley. |
| Pseudocode | Yes | Algorithm 1 Transfer and Exploration via the Information Bottleneck |
| Open Source Code | Yes | For reproducibility purposes of our experiments, we will further release the code on github that will be available on https://github.com/anonymous |
| Open Datasets | Yes | The first set of environments we consider are partially observable grid worlds generated with Mini Grid (Chevalier-Boisvert and Willems, 2018), an Open AI Gym package (Brockman et al., 2016). |
| Dataset Splits | No | The paper mentions training on smaller versions of environments and evaluating on larger versions for generalization, but does not specify traditional train/validation/test dataset splits with percentages or counts. |
| Hardware Specification | No | The paper mentions “Compute Canada for computing resources” but does not specify any particular hardware (GPU/CPU models, memory, etc.) used for the experiments. |
| Software Dependencies | No | The paper mentions using “gym-minigrid”, “Open AI Gym”, and “A2C implementation from (Chevalier-Boisvert and Willems, 2018)”, and “open-source A2C implementation from Kostrikov (2018)” and “PPO... with the open-source implementation available in Kostrikov (2018)”. However, it does not provide specific version numbers for these libraries or other software dependencies like Python, PyTorch/TensorFlow, or CUDA. |
| Experiment Setup | Yes | For the maze environments, we use A2C with 48 parallel workers. Our actor network and critic networks consist of two and three fully connected layers respectively, each of which have 128 hidden units. The encoder network is also parameterized as a neural network, which consists of 1 fully connected layer. We use RMSProp with an initial learning rate of 0.0007 to train the models, for both Info Bot and the baseline for a fair comparison. |