Preventing Dimensional Collapse in Self-Supervised Learning via Orthogonality Regularization

Authors: Junlin He, Jinxiao Du, Wei Ma

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our empirical investigations demonstrate that OR significantly enhances the performance of SSL methods across diverse benchmarks, yielding consistent gains with both CNNs and Transformer-based architectures.
Researcher Affiliation Academia Junlin He The Hong Kong Polytechnic University Hong Kong SAR, China junlinspeed.he@connect.polyu.hk Jinxiao Du The Hong Kong Polytechnic University Hong Kong SAR, China jinxiao.du@connect.polyu.hk Wei Ma The Hong Kong Polytechnic University Hong Kong SAR, China wei.w.ma@polyu.edu.hk
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes Our code will be released at https://github.com/Umaruchain/OR_in_SSL.git.
Open Datasets Yes We pretrain SSL methods on CIFAR-10, CIFAR-100, IMAGENET-100 and IMAGENET-1k and evaluate transfer learning scenarios on datasets including CIFAR-100, CIFA-10 (Krizhevsky et al. 2009), Food-101 (Bossard et al. 2014), Flowers-102 (Xia et al. 2017), DTD (Sharan et al. 2014), GTSRB (Haloi 2015).
Dataset Splits Yes The splits of the training and test set follow torchvision Marcel & Rodriguez (2010). For OR, γ of SRIP is tuned from {1e 3, 1e 4, 1e 5} and γ of SO is tuned from {1e 5, 1e 6, 1e 7} on a validation set.
Hardware Specification Yes Our experiments were all completed on 4 3090 GPUs.
Software Dependencies No The paper mentions using 'Solo-learn' and 'Lightly SSL' frameworks, and 'detectron2', but does not specify their version numbers or other crucial software dependencies with version details needed for exact replication.
Experiment Setup Yes For OR, γ of SRIP is tuned from {1e 3, 1e 4, 1e 5} and γ of SO is tuned from {1e 5, 1e 6, 1e 7} on a validation set. When training the linear classifier, we use 100 epochs, weight decay to 0.0005, learning rate 0.1 (we divide the learning rate by a factor of 10 on Epoch 60 and 100), batchsize 256, and SGD with Nesterov momentum as optimizer (In IMAGENET-1k, we use batchsize 128 and learning rate 0.2).