Improved Mutual Information Estimation
Authors: Youssef Mroueh, Igor Melnyk, Pierre Dognin, Jarret Ross, Tom Sercu9009-9017
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate its performance on synthetic examples, showing advantage over the existing baselines. We demonstrate its strength in large-scale self-supervised representation learning through MI maximization. |
| Researcher Affiliation | Industry | Youssef Mroueh, Igor Melnyk, Pierre Dognin, Jarret Ross, Tom Sercu * IBM Research AI *Tom Sercu is now with FAIR. |
| Pseudocode | Yes | Algorithm 1 η-MINE (Stochastic BCD ) |
| Open Source Code | No | The paper does not contain any explicit statement about providing open-source code for the described methodology, nor does it provide a link to a code repository. |
| Open Datasets | Yes | For feature representation we used an encoder similar to DCGAN (Radford, Metz, and Chintala 2015), shown in Fig. 4.a and evaluated results on CIFAR10 and STL10 datasets (STL10 images were scaled down to match CIFAR10 resolution). All models are built on a 10% subset of Image Net (128K train., 5K val., 1K classes) as proposed by (Kolesnikov, Zhai, and Beyer 2019). |
| Dataset Splits | Yes | MI estimation. We compared different MI estimators on three synthetically generated Gaussian datasets [5K training and 1K testing samples]. All models are built on a 10% subset of Image Net (128K train., 5K val., 1K classes) as proposed by (Kolesnikov, Zhai, and Beyer 2019). |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used to run the experiments. |
| Software Dependencies | No | The paper does not specify version numbers for any key software components or dependencies used in the experiments (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | Inputs: X, Y dataset X RN dx, Y RN dy, such that (xi = Xi,., yi = Yi,.) pxy Hyperparameters: αη, αθ (learning rates), nc (number of critic updates) All models are built on a 10% subset of Image Net (128K train., 5K val., 1K classes) as proposed by (Kolesnikov, Zhai, and Beyer 2019). E is a Res Net50 for all our experiments. For all results, E is trained from Jigsaw task (CE or MI) and frozen, with only C trained as in (Kolesnikov, Zhai, and Beyer 2019). |