reproducibilityindex.ai

On Mutual Information Maximization for Representation Learning

Authors: Michael Tschannen, Josip Djolonga, Paul K. Rubenstein, Sylvain Gelly, Mario Lucic

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper we argue, and provide empirical evidence, that the success of these methods cannot be attributed to the properties of MI alone, and that they strongly depend on the inductive bias in both the choice of feature extractor architectures and the parametrization of the employed MI estimators.
Researcher Affiliation	Collaboration	Michael Tschannen Josip Djolonga Paul K. Rubenstein Sylvain Gelly Mario Lucic Google Research, Brain Team Ph D student at University of Cambridge and the Max Planck Institute for Intelligent Systems, Tübingen. Correspondence to Michael Tschannen (tschannen@google.com), Josip Djolonga (josipd@google.com), and Mario Lucic (lucic@google.com).
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	The code for running the experiments and visualizing the results is available at https://github.com/googleresearch/google-research/tree/master/mutual_information_representation_learning.
Open Datasets	Yes	To this end, we consider a simple setup of learning a representation of the top half of MNIST handwritten digit images (we present results for the experiments from Sections 3.2 and 3.3 on CIFAR10 in Appendix G
Dataset Splits	No	The paper mentions using MNIST and CIFAR10 datasets but does not explicitly provide details about a validation dataset split (e.g., percentages or sample counts).
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments.
Software Dependencies	No	We train g1, g2, and f using the Adam optimizer (Kingma and Ba, 2015), and use g1(xtop) as the representation for the linear evaluation. Unless stated otherwise, we use a bilinear critic f(x, y) = x Wy (we investigate its effect in a separate ablation study), set the batch size to 128 and the learning rate to 10 4.
Experiment Setup	Yes	Unless stated otherwise, we use a bilinear critic f(x, y) = x Wy (we investigate its effect in a separate ablation study), set the batch size to 128 and the learning rate to 10 4.5 Throughout, IEST values and downstream classiﬁcation accuracies are averaged over 20 runs and reported on the testing set