Intriguing Properties of Contrastive Losses
Authors: Ting Chen, Calvin Luo, Lala Li
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct experiments following Sim CLR settings [13, 14], and use the linear evaluation protocol. Detailed experimental setup can be found in Appendix A.1. Figure 1 shows linear evaluation results of models trained with different losses on CIFAR-10 and Image Net datasets. |
| Researcher Affiliation | Industry | Ting Chen Google Research iamtingchen@google.com Calvin Luo Google Research calvinluo@google.com Lala Li Google Research lala@google.com |
| Pseudocode | Yes | Algorithm 1 Sliced Wasserstein Distance (SWD) loss. input: activation vectors H Rb d, a prior distribution (e.g. Gaussian) sampler S draw prior vectors P Rb d using S generate random orthogonal matrix W Rd d make projections: H = HW ; P = P W initialize SWD loss ℓ= 0 for j {1, 2, , d } do ℓ= ℓ+ sort(H :,j) sort(P :,j) 2 end for return ℓ/(dd ) |
| Open Source Code | Yes | 1Code and visualization at https://contrastive-learning.github.io/intriguing. |
| Open Datasets | Yes | On CIFAR-10, we see little difference in terms of linear evaluation for variants of the generalized contrastive losses, especially when trained longer than 200 epochs. As for Image Net, there are some discrepancies between different losses, but they disappear when a deeper 3-layer non-linear projection head is used. We place MNIST digits (28 28 size) on a shared canvas (112 112 size). run inference on images (from Image Net validation set and COCO [23]) |
| Dataset Splits | No | The paper mentions 'linear evaluation protocol' and 'Image Net validation set' but does not provide specific numerical dataset splits (e.g., percentages or sample counts for train/validation/test) or explicit details on the splitting methodology, beyond implying the use of standard splits for datasets like ImageNet. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory, or processor types) used for running the experiments. It only mentions 'TPUs' in the acknowledgements in a general context, not as specific hardware for the experiments. |
| Software Dependencies | No | The paper does not explicitly list software dependencies with specific version numbers. While it references LARS optimizer, no version is provided, nor are versions for general deep learning frameworks like PyTorch or TensorFlow, which would typically be used. |
| Experiment Setup | Yes | Table 2: Linear eval accuracy of Res Net-50 on Image Net. Projection head Batch size Epoch 100 200 400 800 2 layers 512 65.4 67.3 68.7 69.3 1024 65.6 67.6 68.8 69.8 2048 65.3 67.6 69.0 70.1 3 layers 512 66.6 68.4 70.0 71.0 1024 66.8 68.9 70.1 70.9 2048 66.8 69.1 70.4 71.3 4 layers 512 66.8 68.8 70.0 70.7 1024 67.0 69.0 70.4 70.9 2048 67.0 69.3 70.4 71.3. τ is a temperature scalar. With proper learning rate scaling across batch sizes (e.g. square root scaling with LARS optimizer [21]) |