Understanding self-supervised learning dynamics without contrastive pairs
Authors: Yuandong Tian, Xinlei Chen, Surya Ganguli
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | On Image Net, it performs comparably with more complex two-layer non-linear predictors that employ Batch Norm and outperforms a linear predictor by 2.5% in 300-epoch training (and 5% in 60-epoch). Direct Pred is motivated by our theoretical study of the nonlinear learning dynamics of non-contrastive SSL in simple linear networks. Our simple theory recapitulates the results of real-world ablation studies in both STL-10 and Image Net. |
| Researcher Affiliation | Collaboration | Yuandong Tian 1 Xinlei Chen 1 Surya Ganguli 1 2 1Facebook AI Research 2Stanford University. Correspondence to: Yuandong Tian <yuandong@fb.com>. |
| Pseudocode | No | The paper does not contain any pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is released1. 1https://github.com/facebookresearch/luckmatters/tree/master/ssl |
| Open Datasets | Yes | Top-1 accuracy in STL-10 (Coates et al., 2011) downstream classification task. ... On the standard Image Net benchmark (300 epochs)... CIFAR-10 (Krizhevsky et al., 2009). |
| Dataset Splits | No | The paper uses standard benchmark datasets like STL-10, CIFAR-10, and ImageNet but does not provide explicit numerical details (percentages or counts) for train/validation/test splits, nor does it specify a cross-validation strategy in a reproducible manner. It refers to 'linear evaluation protocol' but without detailing the split. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments, such as GPU/CPU models or specific machine configurations. |
| Software Dependencies | No | The paper mentions software components like 'SGD as the optimizer', 'Res Net-18', and 'LARS optimizer', but it does not provide specific version numbers for any of these software dependencies. |
| Experiment Setup | Yes | Unless explicitly stated, in all our experiments, we use Res Net-18 (He et al., 2016) as the backbone network for CIFAR10/STL-10 experiments and SGD as the optimizer with learning rate α = 0.03, momentum 0.9, weight decay η = 0.0004 and EMA parameter γa = 0.996. Each setting is repeated 5 times. ... The second setting follows BYOL more closely, where we use a symmetrized loss, 4096 batch size and LARS optimizer (You et al., 2017), and train for 300 epochs. |