Self-supervised Learning is More Robust to Dataset Imbalance

Authors: Hong Liu, Jeff Z. HaoChen, Adrien Gaidon, Tengyu Ma

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental First, we find out via extensive experiments that off-the-shelf self-supervised representations are already more robust to class imbalance than supervised representations. The performance gap between balanced and imbalanced pre-training with SSL is significantly smaller than the gap with supervised learning, across sample sizes, for both in-domain and, especially, out-of-domain evaluation.
Researcher Affiliation Collaboration Hong Liu Stanford University hliu99@stanford.edu Jeff Z. Hao Chen Stanford University jhaochen@stanford.edu Adrien Gaidon Toyota Research Institute adrien.gaidon@tri.global Tengyu Ma Stanford University tengyuma@stanford.edu
Pseudocode Yes Algorithm 1 Reweighted Sharpness-Aware Minimization (rw SAM)
Open Source Code Yes Code is available at https://github.com/Liuhong99/Imbalanced-SSL.
Open Datasets Yes We pre-train the representations on variants of Image Net (Russakovsky et al., 2015) or CIFAR-10 (Krizhevsky & Hinton, 2009) with a wide range of numbers of examples and ratios of imbalance.
Dataset Splits Yes For ID evaluation, we use the original CIFAR-10 or Image Net training set for the training phase of linear probe and use the original validation set for the final evaluation.
Hardware Specification No The paper does not provide any specific hardware details such as GPU models (e.g., NVIDIA A100), CPU models, or memory specifications used for running the experiments.
Software Dependencies No The paper mentions various software components and models like Res Net-18, Res Net-50, Mo Co v2, Sim Siam, Sim CLR, Grad-CAM, and Rand Augment, but it does not specify any version numbers for these software dependencies.
Experiment Setup Yes For self-supervised learning, the initial learning rate on the standard Image Net-LT is set to 0.025 with batch-size 256. We train the model for 300 epochs on the standard Image Net-LT and adopt cosine learning rate decay following (He et al., 2020; Chen & He, 2021). We set the initial learning rate to 30 when training the linear head with batch-size 4096 and train for 100 epochs in total.