reproducibilityindex.ai

On the Stepwise Nature of Self-Supervised Learning

Authors: James B Simon, Maksis Knutins, Liu Ziyin, Daniel Geisz, Abraham J Fetterman, Joshua Albrecht

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we empirically examine the training of Barlow Twins, Sim CLR, and VICReg using Res Nets with various initializations and hyperparameters and in all cases clearly observe the stepwise behavior predicted by our analytical model.
Researcher Affiliation	Collaboration	James B. Simon 1 2 Maksis Knutins 2 Liu Ziyin 3 Daniel Geisz 1 Abraham J. Fetterman 2 Joshua Albrecht 2 1UC Berkeley 2Generally Intelligent 3University of Tokyo. Correspondence to: James Simon <james.simon@berkeley.edu>.
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code to reproduce results available at https://gitlab.com/ generally-intelligent/ssl_dynamics.
Open Datasets	Yes	We sample n = 500 random images from CIFAR-10 (Krizhevsky, 2009) and, for each, take two random crops to size 20 20 3 to obtain n positive pairs (which thus have feature dimension m = 1200).
Dataset Splits	No	The paper does not specify the exact percentages or sample counts for training, validation, and test splits, nor does it refer to a standard predefined split with proper citation.
Hardware Specification	No	The paper mentions running experiments on a 'single GPU' and 'single consumer GPU' but does not specify the exact model (e.g., NVIDIA A100, Tesla V100, etc.) or any other hardware details like CPU, memory, or specific machine types.
Software Dependencies	No	The paper mentions using 'functorch' but does not provide a specific version number for it or any other software libraries or dependencies used in the experiments.
Experiment Setup	Yes	We train a single-hidden-layer MLP for 7000 epochs over a fixed batch of 50 images from CIFAR10 using full-batch SGD. Each image is subject to a random 20x20 crop and no other augmentations. The learning rate is η = 0.0001 and weights are scaled upon initialization by α = 0.0001. The hidden layer has width 2048 and the network output dimension is d = 10. We use Barlow Twins loss, but do not apply batch norm to the embeddings when calculating the cross-correlation matrix. λ is set to 1.