reproducibilityindex.ai

Dissecting Supervised Contrastive Learning

Authors: Florian Graf, Christoph Hofer, Marc Niethammer, Roland Kwitt

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	5. Experiments In any practical setting, we do not have an ideal encoder (as in 3), but an encoder parameterized as a neural network, ϕθ. Hence, in 5.2, we ﬁrst assess whether the regular simplex conﬁgurations actually arise (and to which extent), given a ﬁxed iteration budget during optimization. Second, in 5.3, we study the optimization behavior of models under different loss functions in a series of random label experiments. As our choice of ϕθ, we select a Res Net-18 (He et al., 2016a) model, i.e., all layers up to the linear classiﬁer. Experiments are conducted on CIFAR10/100, for which this choice yields 512-dim. representations (and K h+1 holds in all cases).
Researcher Affiliation	Academia	Florian Graf 1 Christoph D. Hofer 1 Marc Niethammer 2 Roland Kwitt 1 1Department of Computer Science, University of Salzburg, Austria 2UNC Chapel Hill. Correspondence to: Florian Graf <florian.graf@sbg.ac.at>.
Pseudocode	No	No structured pseudocode or algorithm blocks were found in the paper.
Open Source Code	Yes	Source code to reproduce experiments is publicly available: https://github.com/plus-rkwitt/py_supcon_vs_ce
Open Datasets	Yes	Experiments are conducted on CIFAR10/100, for which this choice yields 512-dim. representations (and K h+1 holds in all cases).
Dataset Splits	No	The paper uses standard benchmark datasets like CIFAR10/100 but does not explicitly state the training, validation, and test splits (e.g., percentages or counts) or refer to standard splits with citations for reproducibility, beyond implying training on the datasets.
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments were provided in the paper.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow, CUDA versions) used in the experiments.
Experiment Setup	Yes	Optimization is done via (mini-batch) stochastic gradient descent with L2 regularization (10 4) and momentum (0.9) for 100k iterations. The batch-size is ﬁxed to 256 and the learning rate is annealed exponentially, starting from 0.1. When using data augmentation, we apply random cropping and random horizontal ﬂipping, each with probability 1/2.