Implicit Contrastive Representation Learning with Guided Stop-gradient
Authors: Byeongchan Lee, Sehyun Lee
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We apply our method to benchmark algorithms Sim Siam and BYOL and show that our method stabilizes training and boosts performance. We also show that the algorithms with our method work well with small batch sizes and do not collapse even when there is no predictor. The code is available in the supplementary material. In this section, we compare Sim Siam and BYOL with GSG to the original Sim Siam and BYOL on various datasets and tasks. Table 1 shows that applying GSG consistently increases the performance. |
| Researcher Affiliation | Collaboration | Byeongchan Lee Gauss Labs Seoul, Korea byeongchan.lee@gausslabs.ai Sehyun Lee KAIST Daejeon, Korea sehyun.lee@kaist.ac.kr |
| Pseudocode | Yes | For a better understanding, refer to Figure 3 and Appendix A. For simplicity, we present the overview and pseudocode for Sim Siam with GSG, but they are analogous to BYOL with GSG. |
| Open Source Code | Yes | The code is available in the supplementary material. |
| Open Datasets | Yes | We use Image Net [Deng et al., 2009] and CIFAR-10 [Krizhevsky et al., 2009] as benchmark datasets. For datasets, we adopt widely used benchmark datasets in transfer learning such as CIFAR-10, Aircraft [Maji et al., 2013], Caltech [Fei-Fei et al., 2004], Cars [Krause et al., 2013], DTD [Cimpoi et al., 2014], Flowers [Nilsback and Zisserman, 2008], Food [Bossard et al., 2014], Pets [Parkhi et al., 2012], SUN397 [Xiao et al., 2010], and VOC2007 [Everingham et al., 2010]. |
| Dataset Splits | Yes | We freeze the trained backbone, attach a linear classifier to the backbone, fit the classifier on the training set in a supervised manner for 90 epochs, and test the classifier on the test set. We train for 30 epochs and test on the validation set. |
| Hardware Specification | Yes | We implement the algorithms with Pytorch [Paszke et al., 2019] and run all the experiments on 8 NVIDIA A100 GPUs. |
| Software Dependencies | No | The paper mentions 'Pytorch [Paszke et al., 2019]' but does not provide a specific version number for PyTorch or any other software dependency. |
| Experiment Setup | Yes | For Image Net, we use the Res Net-50 backbone [He et al., 2016], a three-layered MLP projector, and a two-layered MLP predictor. We use a batch size of 512 and train the network for 100 epochs. We use the SGD optimizer with momentum of 0.9, learning rate of 0.1, and weight decay rate of 0.0001. We use a cosine decay schedule [Chen et al., 2020a, Loshchilov and Hutter, 2016] for the learning rate. For CIFAR-10, we use a CIFAR variant of the Res Net-18 backbone, a two-layered MLP projector, and a two-layered MLP predictor. We use a batch size of 512 and train the network for 200 epochs. We use the SGD optimizer with momentum of 0.9, learning rate of 0.06, and weight decay rate of 0.0005. For Image Net, we use a batch size of 4096 and the LARS optimizer [You et al., 2017]. For CIFAR-10, we use a batch size of 256 and the SGD optimizer with momentum of 0.9, learning rate of 30, and a cosine decay schedule. |