reproducibilityindex.ai

Intriguing Properties of Contrastive Losses

Authors: Ting Chen, Calvin Luo, Lala Li

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct experiments following Sim CLR settings [13, 14], and use the linear evaluation protocol. Detailed experimental setup can be found in Appendix A.1. Figure 1 shows linear evaluation results of models trained with different losses on CIFAR-10 and Image Net datasets.
Researcher Affiliation	Industry	Ting Chen Google Research iamtingchen@google.com Calvin Luo Google Research calvinluo@google.com Lala Li Google Research lala@google.com
Pseudocode	Yes	Algorithm 1 Sliced Wasserstein Distance (SWD) loss. input: activation vectors H Rb d, a prior distribution (e.g. Gaussian) sampler S draw prior vectors P Rb d using S generate random orthogonal matrix W Rd d make projections: H = HW ; P = P W initialize SWD loss ℓ= 0 for j {1, 2, , d } do ℓ= ℓ+ sort(H :,j) sort(P :,j) 2 end for return ℓ/(dd )
Open Source Code	Yes	1Code and visualization at https://contrastive-learning.github.io/intriguing.
Open Datasets	Yes	On CIFAR-10, we see little difference in terms of linear evaluation for variants of the generalized contrastive losses, especially when trained longer than 200 epochs. As for Image Net, there are some discrepancies between different losses, but they disappear when a deeper 3-layer non-linear projection head is used. We place MNIST digits (28 28 size) on a shared canvas (112 112 size). run inference on images (from Image Net validation set and COCO [23])
Dataset Splits	No	The paper mentions 'linear evaluation protocol' and 'Image Net validation set' but does not provide specific numerical dataset splits (e.g., percentages or sample counts for train/validation/test) or explicit details on the splitting methodology, beyond implying the use of standard splits for datasets like ImageNet.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory, or processor types) used for running the experiments. It only mentions 'TPUs' in the acknowledgements in a general context, not as specific hardware for the experiments.
Software Dependencies	No	The paper does not explicitly list software dependencies with specific version numbers. While it references LARS optimizer, no version is provided, nor are versions for general deep learning frameworks like PyTorch or TensorFlow, which would typically be used.
Experiment Setup	Yes	Table 2: Linear eval accuracy of Res Net-50 on Image Net. Projection head Batch size Epoch 100 200 400 800 2 layers 512 65.4 67.3 68.7 69.3 1024 65.6 67.6 68.8 69.8 2048 65.3 67.6 69.0 70.1 3 layers 512 66.6 68.4 70.0 71.0 1024 66.8 68.9 70.1 70.9 2048 66.8 69.1 70.4 71.3 4 layers 512 66.8 68.8 70.0 70.7 1024 67.0 69.0 70.4 70.9 2048 67.0 69.3 70.4 71.3. τ is a temperature scalar. With proper learning rate scaling across batch sizes (e.g. square root scaling with LARS optimizer [21])