Improved Mutual Information Estimation

Authors: Youssef Mroueh, Igor Melnyk, Pierre Dognin, Jarret Ross, Tom Sercu9009-9017

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate its performance on synthetic examples, showing advantage over the existing baselines. We demonstrate its strength in large-scale self-supervised representation learning through MI maximization.
Researcher Affiliation Industry Youssef Mroueh, Igor Melnyk, Pierre Dognin, Jarret Ross, Tom Sercu * IBM Research AI *Tom Sercu is now with FAIR.
Pseudocode Yes Algorithm 1 η-MINE (Stochastic BCD )
Open Source Code No The paper does not contain any explicit statement about providing open-source code for the described methodology, nor does it provide a link to a code repository.
Open Datasets Yes For feature representation we used an encoder similar to DCGAN (Radford, Metz, and Chintala 2015), shown in Fig. 4.a and evaluated results on CIFAR10 and STL10 datasets (STL10 images were scaled down to match CIFAR10 resolution). All models are built on a 10% subset of Image Net (128K train., 5K val., 1K classes) as proposed by (Kolesnikov, Zhai, and Beyer 2019).
Dataset Splits Yes MI estimation. We compared different MI estimators on three synthetically generated Gaussian datasets [5K training and 1K testing samples]. All models are built on a 10% subset of Image Net (128K train., 5K val., 1K classes) as proposed by (Kolesnikov, Zhai, and Beyer 2019).
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used to run the experiments.
Software Dependencies No The paper does not specify version numbers for any key software components or dependencies used in the experiments (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes Inputs: X, Y dataset X RN dx, Y RN dy, such that (xi = Xi,., yi = Yi,.) pxy Hyperparameters: αη, αθ (learning rates), nc (number of critic updates) All models are built on a 10% subset of Image Net (128K train., 5K val., 1K classes) as proposed by (Kolesnikov, Zhai, and Beyer 2019). E is a Res Net50 for all our experiments. For all results, E is trained from Jigsaw task (CE or MI) and frozen, with only C trained as in (Kolesnikov, Zhai, and Beyer 2019).