reproducibilityindex.ai

Understanding Contrastive Representation Learning through Alignment and Uniformity on the Hypersphere

Authors: Tongzhou Wang, Phillip Isola

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on standard vision and language datasets conﬁrm the strong agreement between both metrics and downstream task performance. Directly optimizing for these two metrics leads to representations with comparable or better performance at downstream tasks than contrastive learning. In this section, we empirically verify the hypothesis that alignment and uniformity are desired properties for representations. We conduct extensive experiments with convolutional neural network (CNN) and recurrent neural network (RNN) based encoders on four popular representation learning benchmarks with distinct types of downstream tasks: STL-10, NYU-DEPTH-V2, IMAGENET-100, BOOKCORPUS.
Researcher Affiliation	Academia	Tongzhou Wang 1 Phillip Isola 1 1MIT Computer Science & Artiﬁcial Intelligence Lab (CSAIL). Correspondence to: Tongzhou Wang <tongzhou@mit.edu>.
Pseudocode	Yes	Due to their simple forms, these two losses can be implemented in Py Torch (Paszke et al., 2019) with less than 10 lines of code, as shown in Figure 5. Figure 5: Py Torch implementation of Lalign and Luniform.
Open Source Code	Yes	Code: github.com/Ssn L/align uniform.
Open Datasets	Yes	STL-10 (Coates et al., 2011) classiﬁcation..., NYU-DEPTH-V2 (Nathan Silberman & Fergus, 2012) depth prediction..., IMAGENET-100 (100 randomly selected classes from IMAGENET) classiﬁcation..., BOOKCORPUS (Zhu et al., 2015) RNN sentence encoder outputs...
Dataset Splits	Yes	Figure 3 summarizes the resulting distributions of validation set features. For each encoder, we measure the downstream task performance, and the Lalign, Luniform metrics on the validation set. STL-10: The best result is picked by encoder outputs linear classiﬁer accuracy from a 5-fold training set cross validation
Hardware Specification	No	The paper does not provide specific details on the hardware used for running experiments (e.g., GPU models, CPU types, memory, or cloud instance types).
Software Dependencies	No	The paper mentions
Experiment Setup	Yes	All three encoders share the same Alex Net based architecture (Krizhevsky et al., 2012), modiﬁed to map input images to 2-dimensional vectors in S1. Both predictive and contrastive learning use standard data augmentations to augment the dataset and sample positive pairs. ... We optimize a total of 306 STL-10 encoders, 64 NYUDEPTH-V2 encoders, 45 IMAGENET-100 encoders, and 108 BOOKCORPUS encoders without supervision. The encoders are optimized w.r.t. weighted combinations of Lcontrastive, Lalign, and/or Luniform, with varying (possibly zero) weights on the three losses, loss hyperparameters: τ for Lcontrastive, α for Lalign, and t for Luniform, batch size (affecting the number of (negative) pairs for Lcontrastive and Luniform), embedding dimension, number of training epochs and learning rate, initialization (from scratch vs. a pretrained encoder).