What shapes the loss landscape of self supervised learning?

Authors: Liu Ziyin, Ekdeep Singh Lubana, Masahito Ueda, Hidenori Tanaka

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In Appendix Section A, we also present numerical simulations that directly validate the predictions of the theory. and Figure 2: Landscape of Resnet18 (upper) and vision transformers (lower) on CIFAR10 with Sim CLR qualitatively agrees with our linear theory.
Researcher Affiliation Collaboration 1Department of Physics, The University of Tokyo, Tokyo, Japan 2Physics & Informatics Laboratories, NTT Research, Inc., Sunnyvale, CA, USA 3Center for Brain Science, Harvard University, Cambridge, USA 4EECS Department, University of Michigan, Ann Arbor, USA 5Institute for Physics of Intelligence, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 6RIKEN Center for Emergent Matter Science (CEMS), Wako, Saitama, Japan
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper discusses models and training procedures (e.g., 'We use Res Net-12 models as the backbone for all experiments', 'Sim CLR augmentations are followed') but does not provide a statement or link indicating that the source code for their specific contributions or methodologies is open-source or publicly available.
Open Datasets Yes We train a Resnet18 on CIFAR10 with the Sim CLR loss... and For our experiments measuring the influence on imbalanced datasets on SSL training, we use CIFAR-10...
Dataset Splits No For our experiments measuring the influence on imbalanced datasets on SSL training, we use CIFAR-10 by sampling 20000 samples out of the 50000 training samples. The sampling process is described by a Dirichlet distribution and is often used to analyze effects of heterogeneity and data imbalance in Federated Learning problems (Hsu et al., 2019). (This only describes how the training data subset was formed, not a train/val/test split for reproducibility.)
Hardware Specification No The paper does not provide specific hardware details such as GPU or CPU models used for running its experiments.
Software Dependencies No The paper mentions using 'Sim CLR augmentations' and 'Res Net-12 models' but does not provide specific software dependencies with version numbers (e.g., PyTorch version, TensorFlow version, CUDA version) needed to replicate the experiments.
Experiment Setup Yes All training involves a standardly used cosine decay learning rate schedule, starting at 0.03 and decaying to 0.001. When a projector module is used, it involves a two-layer MLP with hidden dimension of 512 and Batch Norm layer in between. We use SGD for optimization and perform the standardly used linear evaluation protocol for measuring the quality of the final representation. For training the linear layer, we use an initial learning rate of 10 and decay it to 0.01 with a cosine schedule.