reproducibilityindex.ai

Preventing Dimensional Collapse in Self-Supervised Learning via Orthogonality Regularization

Authors: Junlin He, Jinxiao Du, Wei Ma

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our empirical investigations demonstrate that OR significantly enhances the performance of SSL methods across diverse benchmarks, yielding consistent gains with both CNNs and Transformer-based architectures.
Researcher Affiliation	Academia	Junlin He The Hong Kong Polytechnic University Hong Kong SAR, China junlinspeed.he@connect.polyu.hk Jinxiao Du The Hong Kong Polytechnic University Hong Kong SAR, China jinxiao.du@connect.polyu.hk Wei Ma The Hong Kong Polytechnic University Hong Kong SAR, China wei.w.ma@polyu.edu.hk
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	Our code will be released at https://github.com/Umaruchain/OR_in_SSL.git.
Open Datasets	Yes	We pretrain SSL methods on CIFAR-10, CIFAR-100, IMAGENET-100 and IMAGENET-1k and evaluate transfer learning scenarios on datasets including CIFAR-100, CIFA-10 (Krizhevsky et al. 2009), Food-101 (Bossard et al. 2014), Flowers-102 (Xia et al. 2017), DTD (Sharan et al. 2014), GTSRB (Haloi 2015).
Dataset Splits	Yes	The splits of the training and test set follow torchvision Marcel & Rodriguez (2010). For OR, γ of SRIP is tuned from {1e 3, 1e 4, 1e 5} and γ of SO is tuned from {1e 5, 1e 6, 1e 7} on a validation set.
Hardware Specification	Yes	Our experiments were all completed on 4 3090 GPUs.
Software Dependencies	No	The paper mentions using 'Solo-learn' and 'Lightly SSL' frameworks, and 'detectron2', but does not specify their version numbers or other crucial software dependencies with version details needed for exact replication.
Experiment Setup	Yes	For OR, γ of SRIP is tuned from {1e 3, 1e 4, 1e 5} and γ of SO is tuned from {1e 5, 1e 6, 1e 7} on a validation set. When training the linear classifier, we use 100 epochs, weight decay to 0.0005, learning rate 0.1 (we divide the learning rate by a factor of 10 on Epoch 60 and 100), batchsize 256, and SGD with Nesterov momentum as optimizer (In IMAGENET-1k, we use batchsize 128 and learning rate 0.2).