Learning Representations by Maximizing Mutual Information Across Views

Authors: Philip Bachman, R Devon Hjelm, William Buchwalter

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our model using standard datasets: CIFAR10, CIFAR100, STL10 [Coates et al., 2011], Image Net1 [Russakovsky et al., 2015], and Places205 [Zhou et al., 2014]. We evaluate performance following the protocol described by Kolesnikov et al. [2019]. Our model outperforms prior work on these datasets.
Researcher Affiliation Collaboration Philip Bachman Microsoft Research phil.bachman@gmail.com R Devon Hjelm Microsoft Research, MILA devon.hjelm@microsoft.com William Buchwalter Microsoft Research wibuch@microsoft.com
Pseudocode Yes Figure 1: (c)-top: An algorithm for efficient NCE with minibatches of na images, comprising one antecedent and nc consequents per image. For each true (antecedent, consequent) positive sample pair, we compute the NCE bound using all consequents associated with all other antecedents as negative samples. Our pseudo-code is roughly based on pytorch.
Open Source Code Yes Our code is available online: https://github.com/Philip-Bachman/amdim-public.
Open Datasets Yes We evaluate our model using standard datasets: CIFAR10, CIFAR100, STL10 [Coates et al., 2011], Image Net1 [Russakovsky et al., 2015], and Places205 [Zhou et al., 2014].
Dataset Splits No The paper refers to using 'the training set' and evaluating with linear and MLP classifiers, and states that it follows the evaluation protocol described by Kolesnikov et al. [2019]. However, it does not explicitly provide specific details like exact percentages or sample counts for training, validation, or test splits within its own text.
Hardware Specification Yes We train our models using 4-8 standard Tesla V100 GPUs per model.
Software Dependencies No The paper mentions 'pytorch' in the pseudocode description but does not provide specific version numbers for it or any other software dependencies.
Experiment Setup Yes We use NCE regularization weight λ = 4e 2 for all experiments. [...] We use c = 20 for all experiments. [...] We trained AMDIM models for 150 epochs on 8 NVIDIA Tesla V100 GPUs. [...] On Image Net, using a model with size parameters: (ndf=320, nrkhs=2536, ndepth=10), and a batch size of 1008...