Understanding self-supervised learning dynamics without contrastive pairs

Authors: Yuandong Tian, Xinlei Chen, Surya Ganguli

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental On Image Net, it performs comparably with more complex two-layer non-linear predictors that employ Batch Norm and outperforms a linear predictor by 2.5% in 300-epoch training (and 5% in 60-epoch). Direct Pred is motivated by our theoretical study of the nonlinear learning dynamics of non-contrastive SSL in simple linear networks. Our simple theory recapitulates the results of real-world ablation studies in both STL-10 and Image Net.
Researcher Affiliation Collaboration Yuandong Tian 1 Xinlei Chen 1 Surya Ganguli 1 2 1Facebook AI Research 2Stanford University. Correspondence to: Yuandong Tian <yuandong@fb.com>.
Pseudocode No The paper does not contain any pseudocode or algorithm blocks.
Open Source Code Yes Code is released1. 1https://github.com/facebookresearch/luckmatters/tree/master/ssl
Open Datasets Yes Top-1 accuracy in STL-10 (Coates et al., 2011) downstream classification task. ... On the standard Image Net benchmark (300 epochs)... CIFAR-10 (Krizhevsky et al., 2009).
Dataset Splits No The paper uses standard benchmark datasets like STL-10, CIFAR-10, and ImageNet but does not provide explicit numerical details (percentages or counts) for train/validation/test splits, nor does it specify a cross-validation strategy in a reproducible manner. It refers to 'linear evaluation protocol' but without detailing the split.
Hardware Specification No The paper does not provide specific details about the hardware used for running the experiments, such as GPU/CPU models or specific machine configurations.
Software Dependencies No The paper mentions software components like 'SGD as the optimizer', 'Res Net-18', and 'LARS optimizer', but it does not provide specific version numbers for any of these software dependencies.
Experiment Setup Yes Unless explicitly stated, in all our experiments, we use Res Net-18 (He et al., 2016) as the backbone network for CIFAR10/STL-10 experiments and SGD as the optimizer with learning rate α = 0.03, momentum 0.9, weight decay η = 0.0004 and EMA parameter γa = 0.996. Each setting is repeated 5 times. ... The second setting follows BYOL more closely, where we use a symmetrized loss, 4096 batch size and LARS optimizer (You et al., 2017), and train for 300 epochs.