On the Stepwise Nature of Self-Supervised Learning
Authors: James B Simon, Maksis Knutins, Liu Ziyin, Daniel Geisz, Abraham J Fetterman, Joshua Albrecht
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we empirically examine the training of Barlow Twins, Sim CLR, and VICReg using Res Nets with various initializations and hyperparameters and in all cases clearly observe the stepwise behavior predicted by our analytical model. |
| Researcher Affiliation | Collaboration | James B. Simon 1 2 Maksis Knutins 2 Liu Ziyin 3 Daniel Geisz 1 Abraham J. Fetterman 2 Joshua Albrecht 2 1UC Berkeley 2Generally Intelligent 3University of Tokyo. Correspondence to: James Simon <james.simon@berkeley.edu>. |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code to reproduce results available at https://gitlab.com/ generally-intelligent/ssl_dynamics. |
| Open Datasets | Yes | We sample n = 500 random images from CIFAR-10 (Krizhevsky, 2009) and, for each, take two random crops to size 20 20 3 to obtain n positive pairs (which thus have feature dimension m = 1200). |
| Dataset Splits | No | The paper does not specify the exact percentages or sample counts for training, validation, and test splits, nor does it refer to a standard predefined split with proper citation. |
| Hardware Specification | No | The paper mentions running experiments on a 'single GPU' and 'single consumer GPU' but does not specify the exact model (e.g., NVIDIA A100, Tesla V100, etc.) or any other hardware details like CPU, memory, or specific machine types. |
| Software Dependencies | No | The paper mentions using 'functorch' but does not provide a specific version number for it or any other software libraries or dependencies used in the experiments. |
| Experiment Setup | Yes | We train a single-hidden-layer MLP for 7000 epochs over a fixed batch of 50 images from CIFAR10 using full-batch SGD. Each image is subject to a random 20x20 crop and no other augmentations. The learning rate is η = 0.0001 and weights are scaled upon initialization by α = 0.0001. The hidden layer has width 2048 and the network output dimension is d = 10. We use Barlow Twins loss, but do not apply batch norm to the embeddings when calculating the cross-correlation matrix. λ is set to 1. |