Implicit Contrastive Representation Learning with Guided Stop-gradient

Authors: Byeongchan Lee, Sehyun Lee

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We apply our method to benchmark algorithms Sim Siam and BYOL and show that our method stabilizes training and boosts performance. We also show that the algorithms with our method work well with small batch sizes and do not collapse even when there is no predictor. The code is available in the supplementary material. In this section, we compare Sim Siam and BYOL with GSG to the original Sim Siam and BYOL on various datasets and tasks. Table 1 shows that applying GSG consistently increases the performance.
Researcher Affiliation Collaboration Byeongchan Lee Gauss Labs Seoul, Korea byeongchan.lee@gausslabs.ai Sehyun Lee KAIST Daejeon, Korea sehyun.lee@kaist.ac.kr
Pseudocode Yes For a better understanding, refer to Figure 3 and Appendix A. For simplicity, we present the overview and pseudocode for Sim Siam with GSG, but they are analogous to BYOL with GSG.
Open Source Code Yes The code is available in the supplementary material.
Open Datasets Yes We use Image Net [Deng et al., 2009] and CIFAR-10 [Krizhevsky et al., 2009] as benchmark datasets. For datasets, we adopt widely used benchmark datasets in transfer learning such as CIFAR-10, Aircraft [Maji et al., 2013], Caltech [Fei-Fei et al., 2004], Cars [Krause et al., 2013], DTD [Cimpoi et al., 2014], Flowers [Nilsback and Zisserman, 2008], Food [Bossard et al., 2014], Pets [Parkhi et al., 2012], SUN397 [Xiao et al., 2010], and VOC2007 [Everingham et al., 2010].
Dataset Splits Yes We freeze the trained backbone, attach a linear classifier to the backbone, fit the classifier on the training set in a supervised manner for 90 epochs, and test the classifier on the test set. We train for 30 epochs and test on the validation set.
Hardware Specification Yes We implement the algorithms with Pytorch [Paszke et al., 2019] and run all the experiments on 8 NVIDIA A100 GPUs.
Software Dependencies No The paper mentions 'Pytorch [Paszke et al., 2019]' but does not provide a specific version number for PyTorch or any other software dependency.
Experiment Setup Yes For Image Net, we use the Res Net-50 backbone [He et al., 2016], a three-layered MLP projector, and a two-layered MLP predictor. We use a batch size of 512 and train the network for 100 epochs. We use the SGD optimizer with momentum of 0.9, learning rate of 0.1, and weight decay rate of 0.0001. We use a cosine decay schedule [Chen et al., 2020a, Loshchilov and Hutter, 2016] for the learning rate. For CIFAR-10, we use a CIFAR variant of the Res Net-18 backbone, a two-layered MLP projector, and a two-layered MLP predictor. We use a batch size of 512 and train the network for 200 epochs. We use the SGD optimizer with momentum of 0.9, learning rate of 0.06, and weight decay rate of 0.0005. For Image Net, we use a batch size of 4096 and the LARS optimizer [You et al., 2017]. For CIFAR-10, we use a batch size of 256 and the SGD optimizer with momentum of 0.9, learning rate of 30, and a cosine decay schedule.