reproducibilityindex.ai

Improved Mutual Information Estimation

Authors: Youssef Mroueh, Igor Melnyk, Pierre Dognin, Jarret Ross, Tom Sercu9009-9017

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate its performance on synthetic examples, showing advantage over the existing baselines. We demonstrate its strength in large-scale self-supervised representation learning through MI maximization.
Researcher Affiliation	Industry	Youssef Mroueh, Igor Melnyk, Pierre Dognin, Jarret Ross, Tom Sercu * IBM Research AI *Tom Sercu is now with FAIR.
Pseudocode	Yes	Algorithm 1 η-MINE (Stochastic BCD )
Open Source Code	No	The paper does not contain any explicit statement about providing open-source code for the described methodology, nor does it provide a link to a code repository.
Open Datasets	Yes	For feature representation we used an encoder similar to DCGAN (Radford, Metz, and Chintala 2015), shown in Fig. 4.a and evaluated results on CIFAR10 and STL10 datasets (STL10 images were scaled down to match CIFAR10 resolution). All models are built on a 10% subset of Image Net (128K train., 5K val., 1K classes) as proposed by (Kolesnikov, Zhai, and Beyer 2019).
Dataset Splits	Yes	MI estimation. We compared different MI estimators on three synthetically generated Gaussian datasets [5K training and 1K testing samples]. All models are built on a 10% subset of Image Net (128K train., 5K val., 1K classes) as proposed by (Kolesnikov, Zhai, and Beyer 2019).
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used to run the experiments.
Software Dependencies	No	The paper does not specify version numbers for any key software components or dependencies used in the experiments (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	Inputs: X, Y dataset X RN dx, Y RN dy, such that (xi = Xi,., yi = Yi,.) pxy Hyperparameters: αη, αθ (learning rates), nc (number of critic updates) All models are built on a 10% subset of Image Net (128K train., 5K val., 1K classes) as proposed by (Kolesnikov, Zhai, and Beyer 2019). E is a Res Net50 for all our experiments. For all results, E is trained from Jigsaw task (CE or MI) and frozen, with only C trained as in (Kolesnikov, Zhai, and Beyer 2019).