Learning Representations by Maximizing Mutual Information Across Views
Authors: Philip Bachman, R Devon Hjelm, William Buchwalter
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our model using standard datasets: CIFAR10, CIFAR100, STL10 [Coates et al., 2011], Image Net1 [Russakovsky et al., 2015], and Places205 [Zhou et al., 2014]. We evaluate performance following the protocol described by Kolesnikov et al. [2019]. Our model outperforms prior work on these datasets. |
| Researcher Affiliation | Collaboration | Philip Bachman Microsoft Research phil.bachman@gmail.com R Devon Hjelm Microsoft Research, MILA devon.hjelm@microsoft.com William Buchwalter Microsoft Research wibuch@microsoft.com |
| Pseudocode | Yes | Figure 1: (c)-top: An algorithm for efficient NCE with minibatches of na images, comprising one antecedent and nc consequents per image. For each true (antecedent, consequent) positive sample pair, we compute the NCE bound using all consequents associated with all other antecedents as negative samples. Our pseudo-code is roughly based on pytorch. |
| Open Source Code | Yes | Our code is available online: https://github.com/Philip-Bachman/amdim-public. |
| Open Datasets | Yes | We evaluate our model using standard datasets: CIFAR10, CIFAR100, STL10 [Coates et al., 2011], Image Net1 [Russakovsky et al., 2015], and Places205 [Zhou et al., 2014]. |
| Dataset Splits | No | The paper refers to using 'the training set' and evaluating with linear and MLP classifiers, and states that it follows the evaluation protocol described by Kolesnikov et al. [2019]. However, it does not explicitly provide specific details like exact percentages or sample counts for training, validation, or test splits within its own text. |
| Hardware Specification | Yes | We train our models using 4-8 standard Tesla V100 GPUs per model. |
| Software Dependencies | No | The paper mentions 'pytorch' in the pseudocode description but does not provide specific version numbers for it or any other software dependencies. |
| Experiment Setup | Yes | We use NCE regularization weight λ = 4e 2 for all experiments. [...] We use c = 20 for all experiments. [...] We trained AMDIM models for 150 epochs on 8 NVIDIA Tesla V100 GPUs. [...] On Image Net, using a model with size parameters: (ndf=320, nrkhs=2536, ndepth=10), and a batch size of 1008... |