Learning deep representations by mutual information estimation and maximization
Authors: R Devon Hjelm, Alex Fedorov, Samuel Lavoie-Marchildon, Karan Grewal, Phil Bachman, Adam Trischler, Yoshua Bengio
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our method, which we call Deep Info Max (DIM), outperforms a number of popular unsupervised learning methods and compares favorably with fully-supervised learning on several classification tasks in with some standard architectures. |
| Researcher Affiliation | Collaboration | R Devon Hjelm MSR Montreal, MILA, Ude M, IVADO devon.hjelm@microsoft.com [...] Alex Fedorov MRN, UNM Samuel Lavoie-Marchidon MILA, Ude M [...] Karan Grewal U Toronto Phil Bachman MSR Montreal Adam Trischler MSR Montreal Yoshua Bengio MILA, Ude M, IVADO, CIFAR |
| Pseudocode | No | The paper contains several figures (e.g., Figure 1, 2, 3, 5, 6, 7) illustrating architectures and frameworks, but it does not include any formal pseudocode blocks or sections explicitly labeled as 'Algorithm'. |
| Open Source Code | Yes | Example code for running Deep Infomax (DIM) can be found at https://github.com/rdevon/DIM. |
| Open Datasets | Yes | We test Deep Info Max (DIM) on four imaging datasets to evaluate its representational properties: CIFAR10 and CIFAR100 (Krizhevsky & Hinton, 2009): two small-scale labeled datasets composed of 32 × 32 images with 10 and 100 classes respectively. Tiny Image Net: A reduced version of Image Net (Krizhevsky & Hinton, 2009) images scaled down to 64 × 64 with a total of 200 classes. STL-10 (Coates et al., 2011): a dataset derived from Image Net composed of 96 × 96 images with a mixture of 100000 unlabeled training examples and 500 labeled examples per class. We use data augmentation with this dataset, taking random 64 × 64 crops and flipping horizontally during unsupervised learning. Celeb A (Yang et al., 2015, Appendix A.5 only): An image dataset composed of faces labeled with 40 binary attributes. |
| Dataset Splits | No | The paper mentions "Model selection for the classifiers was done by averaging the last 100 epochs of optimization, and the dropout rate and decaying learning rate schedule was set uniformly to alleviate over-fitting on the test set across all models." However, it does not explicitly define a separate validation dataset split (e.g., specific percentages or sample counts for a validation set) that is distinct from the training and test sets. |
| Hardware Specification | No | The paper describes the encoder architectures (e.g., "DCGAN discriminator for CIFAR10 and CIFAR100", "Alexnet architecture") and mentions training aspects like "ReLU activations and batch norm," but it does not specify any particular hardware components such as GPU models (e.g., NVIDIA A100, RTX 2080 Ti), CPU models, or memory sizes used for running the experiments. |
| Software Dependencies | No | The paper mentions using "Adam" as an optimizer but does not specify versions for any programming languages, deep learning frameworks (e.g., TensorFlow, PyTorch), or other software libraries that would be necessary for reproduction. |
| Experiment Setup | Yes | All models were trained using Adam with a learning rate of 1 × 10−4 for 1000 epochs for CIFAR10 and CIFAR100 and for 200 epochs for all other datasets. |