Dissecting Supervised Contrastive Learning
Authors: Florian Graf, Christoph Hofer, Marc Niethammer, Roland Kwitt
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5. Experiments In any practical setting, we do not have an ideal encoder (as in 3), but an encoder parameterized as a neural network, ϕθ. Hence, in 5.2, we first assess whether the regular simplex configurations actually arise (and to which extent), given a fixed iteration budget during optimization. Second, in 5.3, we study the optimization behavior of models under different loss functions in a series of random label experiments. As our choice of ϕθ, we select a Res Net-18 (He et al., 2016a) model, i.e., all layers up to the linear classifier. Experiments are conducted on CIFAR10/100, for which this choice yields 512-dim. representations (and K h+1 holds in all cases). |
| Researcher Affiliation | Academia | Florian Graf 1 Christoph D. Hofer 1 Marc Niethammer 2 Roland Kwitt 1 1Department of Computer Science, University of Salzburg, Austria 2UNC Chapel Hill. Correspondence to: Florian Graf <florian.graf@sbg.ac.at>. |
| Pseudocode | No | No structured pseudocode or algorithm blocks were found in the paper. |
| Open Source Code | Yes | Source code to reproduce experiments is publicly available: https://github.com/plus-rkwitt/py_supcon_vs_ce |
| Open Datasets | Yes | Experiments are conducted on CIFAR10/100, for which this choice yields 512-dim. representations (and K h+1 holds in all cases). |
| Dataset Splits | No | The paper uses standard benchmark datasets like CIFAR10/100 but does not explicitly state the training, validation, and test splits (e.g., percentages or counts) or refer to standard splits with citations for reproducibility, beyond implying training on the datasets. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments were provided in the paper. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow, CUDA versions) used in the experiments. |
| Experiment Setup | Yes | Optimization is done via (mini-batch) stochastic gradient descent with L2 regularization (10 4) and momentum (0.9) for 100k iterations. The batch-size is fixed to 256 and the learning rate is annealed exponentially, starting from 0.1. When using data augmentation, we apply random cropping and random horizontal flipping, each with probability 1/2. |