Measuring Self-Supervised Representation Quality for Downstream Classification Using Discriminative Features
Authors: Neha Kalibhat, Kanika Narang, Hamed Firooz, Maziar Sanjabi, Soheil Feizi
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Self-supervised learning (SSL) has shown impressive results in downstream classification tasks. However, there is limited work in understanding their failure modes and interpreting their learned representations. In this paper, we study the representation space of state-of-the-art self-supervised models including Sim CLR, Swa V, Mo Co, BYOL, DINO, Sim Siam, VICReg and Barlow Twins. ... We empirically observe that Q-Score can be used as a zero-shot predictor in distinguishing between correct and incorrect classifications for any SSL model achieving AUPRC of 91.45 on Image Net-100 and 78.78 AUPRC on Image Net1K. |
| Researcher Affiliation | Collaboration | 1University of Maryland, College Park 2Meta AI |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper uses implementations from 'solo-learn (da Costa et al. 2022)' but does not provide concrete access to its own source code or a repository link for the methodology described. |
| Open Datasets | Yes | We use a Res Net-50 encoder for our Image Net-1K experiments and Res Net-18 encoder for all other datasets. We discover discriminative features for each pre-trained model using the train set of each dataset. ... We include more results on CIFAR-10 (Krizhevsky, Nair, and Hinton 2009a), STL-10 (Coates, Lee, and Ng 2011) and CIFAR-100 (Krizhevsky, Nair, and Hinton 2009b) in the Appendix. |
| Dataset Splits | No | The paper mentions using train sets from datasets like ImageNet and standard evaluation protocols but does not explicitly provide specific percentages, sample counts, or detailed methodology for training/validation/test dataset splits. |
| Hardware Specification | Yes | We use a maximum of 4 NVIDIA RTX A4000 GPUs (16GB memory) for all our experiments. |
| Software Dependencies | No | The paper mentions using 'solo-learn' and the 'LARS optimizer' but does not provide specific version numbers for any software dependencies like programming languages, libraries, or frameworks. |
| Experiment Setup | Yes | We further-train each pre-trained model with and without Q-Score regularization (controlled by λ1 and λ2) using a low learning rate of 10 5 for 50 epochs. We find that λ1 = λ2 = 10 4 generally works well for fine tuning. |