The Variational Bandwidth Bottleneck: Stochastic Evaluation on an Information Budget

Authors: Anirudh Goyal, Yoshua Bengio, Matthew Botvinick, Sergey Levine

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we evaluate our proposed method and study the following questions: Better generalization? Does the proposed method learn an effective bottleneck that generalizes better on test distributions, as compared to the standard conditional variational information bottleneck? Learn when to access privileged input? Does the proposed method learn when to access the privileged input dynamically, minimizing unnecessary access?
Researcher Affiliation Collaboration 1 Mila, University of Montreal,2 Deepmind, 3 University of California, Berkeley, :anirudhgoyal9119@gmail.com
Pseudocode No The paper does not contain any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No The paper states that 'Our code is based on' several open-source implementations (e.g., 'https://github.com/maximecb/gym-minigrid'), but it does not explicitly provide a link or statement confirming that the authors' own specific source code for the methodology described in this paper is publicly available.
Open Datasets Yes The Multi Room environments used for this research are part of Mini Grid, which is an open source gridworld package. This package includes a family of reinforcement learning environments compatible with the Open AI Gym framework. https://github.com/maximecb/gym-minigrid
Dataset Splits Yes We use different mazes for training, validation, and testing. [...] For Room NXSY, we trained on Room N2S4 (2 rooms of at most size 6), and evaluate on Room N6S6 (6 rooms of at most size 6) and Room N12S10 (12 rooms, of at most size 10). We also evaluate on the Find Obj SY environment, which consists of 9 connected rooms of size Y 2 Y 2 arranged in a grid. For Find Obj SY, we train on Find Obj S5, and evaluate on Find Obj S7 and Find Obj S10.
Hardware Specification No The paper mentions 'Compute Canada for computing resources' in the Acknowledgements section but does not specify any particular hardware components like CPU or GPU models, or detailed specifications of the computing environment used for experiments.
Software Dependencies No The paper mentions software components like 'Advantage Actor-Critic (A2C)', 'RMSProp', 'Adam optimizer', and refers to frameworks like 'gym-minigrid', but it does not provide specific version numbers for these software dependencies (e.g., 'A2C vX.Y' or 'Python 3.8, PyTorch 1.9').
Experiment Setup Yes We use the following parameters for lower level policies throughout the experiments. Each training iteration consists of 5 environments time steps, and all the networks (value functions, policy , and observation embedding network) are trained at every time step. Every training batch has a size of 64. The value function networks and the embedding network are all neural networks comprised of two hidden layers, with 128 Re LU units at each hidden layer. All the network parameters are updated using Adam optimizer with learning rate 3 10 4. The only hyperparameter we introduce with the variational information bottleneck is β. For both the VIB baseline and the proposed method, we evaluated with 5 values of β: 0.01, 0.09, 0.001, 0.005, 0.009.