Learning deep representations by mutual information estimation and maximization

Authors: R Devon Hjelm, Alex Fedorov, Samuel Lavoie-Marchildon, Karan Grewal, Phil Bachman, Adam Trischler, Yoshua Bengio

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our method, which we call Deep Info Max (DIM), outperforms a number of popular unsupervised learning methods and compares favorably with fully-supervised learning on several classification tasks in with some standard architectures.
Researcher Affiliation Collaboration R Devon Hjelm MSR Montreal, MILA, Ude M, IVADO devon.hjelm@microsoft.com [...] Alex Fedorov MRN, UNM Samuel Lavoie-Marchidon MILA, Ude M [...] Karan Grewal U Toronto Phil Bachman MSR Montreal Adam Trischler MSR Montreal Yoshua Bengio MILA, Ude M, IVADO, CIFAR
Pseudocode No The paper contains several figures (e.g., Figure 1, 2, 3, 5, 6, 7) illustrating architectures and frameworks, but it does not include any formal pseudocode blocks or sections explicitly labeled as 'Algorithm'.
Open Source Code Yes Example code for running Deep Infomax (DIM) can be found at https://github.com/rdevon/DIM.
Open Datasets Yes We test Deep Info Max (DIM) on four imaging datasets to evaluate its representational properties: CIFAR10 and CIFAR100 (Krizhevsky & Hinton, 2009): two small-scale labeled datasets composed of 32 × 32 images with 10 and 100 classes respectively. Tiny Image Net: A reduced version of Image Net (Krizhevsky & Hinton, 2009) images scaled down to 64 × 64 with a total of 200 classes. STL-10 (Coates et al., 2011): a dataset derived from Image Net composed of 96 × 96 images with a mixture of 100000 unlabeled training examples and 500 labeled examples per class. We use data augmentation with this dataset, taking random 64 × 64 crops and flipping horizontally during unsupervised learning. Celeb A (Yang et al., 2015, Appendix A.5 only): An image dataset composed of faces labeled with 40 binary attributes.
Dataset Splits No The paper mentions "Model selection for the classifiers was done by averaging the last 100 epochs of optimization, and the dropout rate and decaying learning rate schedule was set uniformly to alleviate over-fitting on the test set across all models." However, it does not explicitly define a separate validation dataset split (e.g., specific percentages or sample counts for a validation set) that is distinct from the training and test sets.
Hardware Specification No The paper describes the encoder architectures (e.g., "DCGAN discriminator for CIFAR10 and CIFAR100", "Alexnet architecture") and mentions training aspects like "ReLU activations and batch norm," but it does not specify any particular hardware components such as GPU models (e.g., NVIDIA A100, RTX 2080 Ti), CPU models, or memory sizes used for running the experiments.
Software Dependencies No The paper mentions using "Adam" as an optimizer but does not specify versions for any programming languages, deep learning frameworks (e.g., TensorFlow, PyTorch), or other software libraries that would be necessary for reproduction.
Experiment Setup Yes All models were trained using Adam with a learning rate of 1 × 10−4 for 1000 epochs for CIFAR10 and CIFAR100 and for 200 epochs for all other datasets.