$\alpha$-ReQ : Assessing Representation Quality in Self-Supervised Learning by measuring eigenspectrum decay

Authors: Kumar K Agrawal, Arnab Kumar Mondal, Arna Ghosh, Blake Richards

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We analytically and empirically (using multiple datasets, e.g. CIFAR, STL10, MIT67, Image Net) demonstrate that the decay coefficient serves as a measure of representation quality for tasks that are solvable with a linear readout, that is, there exist welldefined intervals for where models exhibit excellent downstream generalization. Furthermore, our experiments suggest that key design parameters in SSL algorithms, such as Barlow Twins [1], implicitly modulate the decay coefficient of the eigenspectrum (α).
Researcher Affiliation Academia Kumar Krishna Agrawal UC Berkeley Arnab Kumar Mondal Mila & Mc Gill University Montréal, QC, Canada Arna Ghosh Mila & Mc Gill University Montréal, QC, Canada Blake A. Richards Mila, Montreal Neurological Institute & Mc Gill University Montréal, QC, Canada Learning in Machines and Brains Program, CIFAR Toronto, ON, Canada
Pseudocode Yes Algorithm 1 Model selection using α
Open Source Code Yes We publicly release our results and code https://github.com/kumarkrishna/fastssl
Open Datasets Yes We analytically and empirically (using multiple datasets, e.g. CIFAR, STL10, MIT67, Image Net) demonstrate that the decay coefficient serves as a measure of representation quality for tasks that are solvable with a linear readout, that is, there exist welldefined intervals for where models exhibit excellent downstream generalization. We provide the results for CIFAR10 [27] and STL10 [28] in Fig. 5 demonstrating that α is a strong indicator of in-distribution generalization performance across a large range of hyperparameter ranges of LBT .
Dataset Splits Yes Compared to training a linear probe, which requires multiple epochs of forward & backward passes through the training dataset’s features, to achieve reasonable estimates of downstream accuracy, computing α requires a single PCA step on the validation dataset’s features.
Hardware Specification Yes Table 1: We report the compute time for CIFAR10, STL10, and Image Net. While CIFAR10 and STL10 are trained for 200 epochs for downstream classification Image Net is trained for 100 epochs. (tested in 1 A100)
Software Dependencies No The paper mentions PyTorch and timm but does not provide specific version numbers for these software dependencies.
Experiment Setup Yes Table 1: We report the compute time for CIFAR10, STL10, and Image Net. While CIFAR10 and STL10 are trained for 200 epochs for downstream classification Image Net is trained for 100 epochs.