Exploring the Gap between Collapsed & Whitened Features in Self-Supervised Learning

Authors: Bobby He, Mete Ozay

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We provide theoretical & empirical evidence highlighting the factors in SSL, like projection layers & regularisation strength, that influence eigenvalue decay rate, & demonstrate that the degree of feature whitening affects generalisation, particularly in label scarce regimes. We use our insights to motivate a novel method, Post-hoc Manipulation of the Principal Axes & Trace (Post Man-Pat), which efficiently post-processes a pretrained encoder to enforce eigenvalue decay rate with power law exponent β, & find that Post Man-Pat delivers improved label efficiency and transferability across a range of SSL methods and encoder architectures.
Researcher Affiliation Collaboration 1University of Oxford, 2Samsung Research UK.
Pseudocode Yes Algorithm 1 Py Torch pseudocode for Post Man-Pat (PMP).
Open Source Code No The paper mentions using existing open-source codebases for baselines and pretrained models (e.g., Sim CLR, Barlow Twins, Sw AV implementations and checkpoints from VISSL library's model zoo), but it does not state that the code for its own proposed method, Post Man-Pat (PMP), is publicly available or provide a link to its implementation.
Open Datasets Yes In Figure 4, we compare various Res Net-18 trained with Barlow Twins on CIFAR-10...Our Image Net-1K implementation was based off the official Barlow Twins (Zbontar et al., 2021) implemtation3...STL-10 analysis Figure 7 is akin to Figure 4, but trained with Barlow Twins on STL-10 dataset...Table 2. Transfer Learning: Comparison of top-1 test accuracies (%) for PMP and LP across SSL methods and transfer datasets...CIFAR100 (Krizhevsky, 2009), Stanford Cars (Krause et al., 2013) and Oxford 102 Flowers (Nilsback & Zisserman, 2008)
Dataset Splits Yes We use a validation split from the accessible training labels to tune hyperparameters for all evaluation schemes, c.f. Appendix C. ...For any given set of labelled data, we split the data into 4:1 splits for the 1% or 10% labelled-data setting, or 2:1 splits for the 0.3% labelled-data setting (as in the 0.3% setting we have only 3 labels per class). Splits were chosen uniformly at random so that each class had an equal number of examples in the larger split, which was then used for training. Top-1 accuracy on the smaller split was used for hyperparameter tuning.
Hardware Specification No The paper does not explicitly describe the specific hardware used (e.g., GPU models, CPU models, memory specifications) for running its experiments. It mentions model architectures like ResNet-18, ResNet-50, and ViT-B/16 but not the underlying hardware.
Software Dependencies No The paper mentions software like 'PyTorch' and 'Torchvision Py Torch library (Paszke et al., 2019)' and 'VISSL library’s (Goyal et al., 2021) model zoo', but it does not specify concrete version numbers for these software components, which is required for reproducible description.
Experiment Setup Yes In Appendix C, titled 'Experimental Details', the paper provides specific details regarding the training procedure and hyperparameters. For instance, in C.1: 'All networks were trained with SGD loss for 100 epochs with weight decay 0.0004, momentum 0.9 & a cosine annealed learning rate...Learning rate was 0.32 for Sim CLR & 0.25 for Barlow Twins, with ρ = 0.01.' C.4 further details: 'In all linear evaluation schemes...we train the classifier for 100 epochs using SGD & momentum 0.9, with a cosine annealed learning rate starting at 0.1, with weight decay tuned in all cases'.