Preventing Dimensional Collapse in Self-Supervised Learning via Orthogonality Regularization
Authors: Junlin He, Jinxiao Du, Wei Ma
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our empirical investigations demonstrate that OR significantly enhances the performance of SSL methods across diverse benchmarks, yielding consistent gains with both CNNs and Transformer-based architectures. |
| Researcher Affiliation | Academia | Junlin He The Hong Kong Polytechnic University Hong Kong SAR, China junlinspeed.he@connect.polyu.hk Jinxiao Du The Hong Kong Polytechnic University Hong Kong SAR, China jinxiao.du@connect.polyu.hk Wei Ma The Hong Kong Polytechnic University Hong Kong SAR, China wei.w.ma@polyu.edu.hk |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code will be released at https://github.com/Umaruchain/OR_in_SSL.git. |
| Open Datasets | Yes | We pretrain SSL methods on CIFAR-10, CIFAR-100, IMAGENET-100 and IMAGENET-1k and evaluate transfer learning scenarios on datasets including CIFAR-100, CIFA-10 (Krizhevsky et al. 2009), Food-101 (Bossard et al. 2014), Flowers-102 (Xia et al. 2017), DTD (Sharan et al. 2014), GTSRB (Haloi 2015). |
| Dataset Splits | Yes | The splits of the training and test set follow torchvision Marcel & Rodriguez (2010). For OR, γ of SRIP is tuned from {1e 3, 1e 4, 1e 5} and γ of SO is tuned from {1e 5, 1e 6, 1e 7} on a validation set. |
| Hardware Specification | Yes | Our experiments were all completed on 4 3090 GPUs. |
| Software Dependencies | No | The paper mentions using 'Solo-learn' and 'Lightly SSL' frameworks, and 'detectron2', but does not specify their version numbers or other crucial software dependencies with version details needed for exact replication. |
| Experiment Setup | Yes | For OR, γ of SRIP is tuned from {1e 3, 1e 4, 1e 5} and γ of SO is tuned from {1e 5, 1e 6, 1e 7} on a validation set. When training the linear classifier, we use 100 epochs, weight decay to 0.0005, learning rate 0.1 (we divide the learning rate by a factor of 10 on Epoch 60 and 100), batchsize 256, and SGD with Nesterov momentum as optimizer (In IMAGENET-1k, we use batchsize 128 and learning rate 0.2). |