$\alpha$-ReQ : Assessing Representation Quality in Self-Supervised Learning by measuring eigenspectrum decay
Authors: Kumar K Agrawal, Arnab Kumar Mondal, Arna Ghosh, Blake Richards
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We analytically and empirically (using multiple datasets, e.g. CIFAR, STL10, MIT67, Image Net) demonstrate that the decay coefficient serves as a measure of representation quality for tasks that are solvable with a linear readout, that is, there exist welldefined intervals for where models exhibit excellent downstream generalization. Furthermore, our experiments suggest that key design parameters in SSL algorithms, such as Barlow Twins [1], implicitly modulate the decay coefficient of the eigenspectrum (α). |
| Researcher Affiliation | Academia | Kumar Krishna Agrawal UC Berkeley Arnab Kumar Mondal Mila & Mc Gill University Montréal, QC, Canada Arna Ghosh Mila & Mc Gill University Montréal, QC, Canada Blake A. Richards Mila, Montreal Neurological Institute & Mc Gill University Montréal, QC, Canada Learning in Machines and Brains Program, CIFAR Toronto, ON, Canada |
| Pseudocode | Yes | Algorithm 1 Model selection using α |
| Open Source Code | Yes | We publicly release our results and code https://github.com/kumarkrishna/fastssl |
| Open Datasets | Yes | We analytically and empirically (using multiple datasets, e.g. CIFAR, STL10, MIT67, Image Net) demonstrate that the decay coefficient serves as a measure of representation quality for tasks that are solvable with a linear readout, that is, there exist welldefined intervals for where models exhibit excellent downstream generalization. We provide the results for CIFAR10 [27] and STL10 [28] in Fig. 5 demonstrating that α is a strong indicator of in-distribution generalization performance across a large range of hyperparameter ranges of LBT . |
| Dataset Splits | Yes | Compared to training a linear probe, which requires multiple epochs of forward & backward passes through the training dataset’s features, to achieve reasonable estimates of downstream accuracy, computing α requires a single PCA step on the validation dataset’s features. |
| Hardware Specification | Yes | Table 1: We report the compute time for CIFAR10, STL10, and Image Net. While CIFAR10 and STL10 are trained for 200 epochs for downstream classification Image Net is trained for 100 epochs. (tested in 1 A100) |
| Software Dependencies | No | The paper mentions PyTorch and timm but does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | Table 1: We report the compute time for CIFAR10, STL10, and Image Net. While CIFAR10 and STL10 are trained for 200 epochs for downstream classification Image Net is trained for 100 epochs. |