On the Effectiveness of Supervision in Asymmetric Non-Contrastive Learning
Authors: Jeongheon Oh, Kibok Lee
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our analysis reveals that providing supervision to ANCL reduces intraclass variance, and the contribution of supervision should be adjusted to achieve the best performance. Experiments demonstrate the superiority of supervised ANCL across various datasets and tasks. |
| Researcher Affiliation | Academia | Jeongheon Oh 1 Kibok Lee 1 Department of Statistics and Data Science, Yonsei University. |
| Pseudocode | No | The paper describes mathematical formulations and processes but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code is available at: https: //github.com/JH-Oh-23/Sup-ANCL. |
| Open Datasets | Yes | We pretrain models on Image Net-100 (Deng et al., 2009; Tian et al., 2020) for 200 epochs with a batch size of 128. For data augmentation, we apply random crop, random horizontal flip, color jitter, random grayscale, and Gaussian blur, following Chen et al. (2020a). For transfer learning, we evaluate the top-1 accuracy across 11 downstream datasets: CIFAR10/CIFAR100 (Krizhevsky & Hinton, 2009), DTD (Cimpoi et al., 2014), Food (Bossard et al., 2014), MIT67 (Quattoni & Torralba, 2009), SUN397 (Xiao et al., 2010), Caltech101 (Fei-Fei et al., 2004), CUB200 (Welinder et al., 2010), Dogs (Khosla et al., 2011; Deng et al., 2009), Flowers (Nilsback & Zisserman, 2008), and Pets (Parkhi et al., 2012). |
| Dataset Splits | Yes | For evaluation, we follow the linear probing protocol for transfer learning in prior works (Kornblith et al., 2019; Lee et al., 2021a). Specifically, we divide the entire training dataset into a train set and a validation set to tune the regularization parameter by minimizing the L2-regularized cross-entropy loss using L-BFGS (Liu & Nocedal, 1989). Train and validation set splits are shown in Table D.1. |
| Hardware Specification | No | The paper mentions '8 V100 GPUs' in the context of other methods' training costs in the Impact Statement, but it does not specify the hardware used for their own experiments. |
| Software Dependencies | No | The paper mentions optimizers like 'SGD' and 'L-BFGS', and implies the use of deep learning frameworks, but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | We pretrain models on Image Net-100 (Deng et al., 2009; Tian et al., 2020) for 200 epochs with a batch size of 128. We utilize the SGD optimizer with a momentum of 0.9, and a weight decay of 1e-4. A cosine learning rate schedule (Loshchilov & Hutter, 2017) is applied to the encoder and projector. We maintain a constant learning rate without decay for the predictor, following the prior work (Chen & He, 2021). Other method-specific details are provided below: SIMCLR... The learning rate is set to 0.1 and the temperature parameter for contrastive loss is 0.1. |