Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Representation Learning Dynamics of Self-Supervised Models

Authors: Pascal Esser, Satyaki Mukherjee, Debarghya Ghoshdastidar

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We numerically illustrate the validity of our theoretical findings, and discuss how the presented results provide a framework for further theoretical analysis of contrastive and non-contrastive SSL. We numerically show, on the MNIST dataset, that our derived SSL learning dynamics can be solved significantly faster than training nonlinear networks, and yet provide comparable accuracy on downstream tasks. For evaluation we use the following experimental setup: We train a network with contrastive loss as defined in (1) using gradient descent with learning rate 0.01 for 100 epochs and hidden layer size from 10 to 1000. We consider the following three loss functions: (1) sigmoid, (2) Re LU (ϕ(x) = max{x, 0}) and (3) tanh. The results are shown in Figure 2 where the plot shows the average over 10 initializations.
Researcher Affiliation Academia Pascal M. Esser EMAIL Technical University of Munich, Germany Satyaki Mukherjee EMAIL National University of Singapore, Singapore Debarghya Ghoshdastidar EMAIL Technical University of Munich, Germany
Pseudocode No The paper contains mathematical equations and differential equations to describe the learning dynamics, but no structured pseudocode or algorithm blocks are provided.
Open Source Code No No explicit statement regarding the release of source code or a link to a code repository is provided in the paper.
Open Datasets Yes For this illustration we now consider two classes with 200 datapoints each from the MNIST dataset Deng (2012).
Dataset Splits No The paper mentions using 'two classes with 200 datapoints each from the MNIST dataset', but does not specify any training, validation, or test splits, nor does it refer to standard splits for this subset of the data.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments. It only describes the experimental setup in terms of training parameters and datasets.
Software Dependencies No The paper describes the experimental setup and methodology but does not list any specific software libraries, frameworks, or their version numbers that were used for implementation.
Experiment Setup Yes For evaluation we use the following experimental setup: We train a network with contrastive loss as defined in (1) using gradient descent with learning rate 0.01 for 100 epochs and hidden layer size from 10 to 1000. We consider the following three loss functions: (1) sigmoid, (2) Re LU (ϕ(x) = max{x, 0}) and (3) tanh.