reproducibilityindex.ai

Understanding self-supervised learning dynamics without contrastive pairs

Authors: Yuandong Tian, Xinlei Chen, Surya Ganguli

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	On Image Net, it performs comparably with more complex two-layer non-linear predictors that employ Batch Norm and outperforms a linear predictor by 2.5% in 300-epoch training (and 5% in 60-epoch). Direct Pred is motivated by our theoretical study of the nonlinear learning dynamics of non-contrastive SSL in simple linear networks. Our simple theory recapitulates the results of real-world ablation studies in both STL-10 and Image Net.
Researcher Affiliation	Collaboration	Yuandong Tian 1 Xinlei Chen 1 Surya Ganguli 1 2 1Facebook AI Research 2Stanford University. Correspondence to: Yuandong Tian <yuandong@fb.com>.
Pseudocode	No	The paper does not contain any pseudocode or algorithm blocks.
Open Source Code	Yes	Code is released1. 1https://github.com/facebookresearch/luckmatters/tree/master/ssl
Open Datasets	Yes	Top-1 accuracy in STL-10 (Coates et al., 2011) downstream classiﬁcation task. ... On the standard Image Net benchmark (300 epochs)... CIFAR-10 (Krizhevsky et al., 2009).
Dataset Splits	No	The paper uses standard benchmark datasets like STL-10, CIFAR-10, and ImageNet but does not provide explicit numerical details (percentages or counts) for train/validation/test splits, nor does it specify a cross-validation strategy in a reproducible manner. It refers to 'linear evaluation protocol' but without detailing the split.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running the experiments, such as GPU/CPU models or specific machine configurations.
Software Dependencies	No	The paper mentions software components like 'SGD as the optimizer', 'Res Net-18', and 'LARS optimizer', but it does not provide specific version numbers for any of these software dependencies.
Experiment Setup	Yes	Unless explicitly stated, in all our experiments, we use Res Net-18 (He et al., 2016) as the backbone network for CIFAR10/STL-10 experiments and SGD as the optimizer with learning rate α = 0.03, momentum 0.9, weight decay η = 0.0004 and EMA parameter γa = 0.996. Each setting is repeated 5 times. ... The second setting follows BYOL more closely, where we use a symmetrized loss, 4096 batch size and LARS optimizer (You et al., 2017), and train for 300 epochs.