Learning Optimal Representations with the Decodable Information Bottleneck

Authors: Yann Dubois, Douwe Kiela, David J. Schwab, Ramakrishna Vedantam

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our framework in practical settings, focusing on: (i) the relation between V-sufficiency and Alice s best achievable performance; (ii) the relation between V-minimality and generalization; (iii) the consequence of a mismatch between VAlice and the functional family VBob w.r.t. which Z is sufficient or minimal especially in IB s setting VBob = U; (iv) the use of our framework to predict generalization of trained networks. Many of our experiments involve sweeping over the complexity of families V V V+, we do this by varying widths of MLPs with V U in the infinite width limit [40, 41].
Researcher Affiliation Collaboration Yann Dubois Facebook AI Research yannd@fb.com Douwe Kiela Facebook AI Research dkiela@fb.com David J. Schwab Facebook AI Research CUNY Graduate Center dschwab@fb.com Ramakrishna Vedantam Facebook AI Research ramav@fb.com
Pseudocode Yes Figure 2: Practical DIB (a) Pseudo-code for ˆLDIB(D)
Open Source Code No The paper does not contain an explicit statement about releasing source code or a link to a code repository for the described methodology.
Open Datasets Yes Right two plots: 2D representations encoded by an multi-layer perceptron (MLP) for odd-even classification of 200 MNIST [22] examples.
Dataset Splits No The paper discusses 'train and test performance' and 'train-test gap' but does not explicitly mention validation data splits or their proportions.
Hardware Specification No The paper describes model architectures like 'Res Net18 encoder' and '3-MLP encoder', but does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for experiments.
Software Dependencies No The paper mentions 'Pytorch' as a reference but does not specify version numbers for PyTorch or any other software dependencies used in the experiments.
Experiment Setup Yes We use a 3-MLP encoder with around 21M parameters and a 1024 dimensional Z. Since we want to investigate the generalization of ERMs resulting from Bob s criterion, we do not use (possibly implicit) regularizers such as large learning rate [44]. For more experimental details see Appx. D.1.