The Close Relationship Between Contrastive Learning and Meta-Learning

Authors: Renkun Ni, Manli Shu, Hossein Souri, Micah Goldblum, Tom Goldstein

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct self-supervised training on both the CIFAR-10 and Image Net datasets (Krizhevsky et al., 2009; Deng et al., 2009). Following Chen et al. (2020a), we evaluate pre-trained representations in a linear evaluation setting, where feature extractors are frozen, and a classification head is stacked on top and tuned. In addition, we test the performance of the pre-trained feature extractors on downstream tasks such as transfer learning and semi-supervised learning with 1% and 10% of labeled data. Evaluation and dataset details can be found in Appendix A.2.
Researcher Affiliation Academia Renkun Ni University of Maryland rn9zm@cs.umd.edu Manli Shu University of Maryland manlis@cs.umd.edu Hossein Souri Johns Hopkins University hsouri1@jhu.edu Micah Goldblum University of Maryland goldblum@umd.edu Tom Goldstein University of Maryland tomg@cs.umd.edu
Pseudocode Yes Algorithm 1: Meta-Learning Framework for Self-Supervised Learning
Open Source Code Yes Our Pytorch implementation can be found on: https://github.com/Renkun Ni/Meta Contrastive
Open Datasets Yes We conduct self-supervised training on both the CIFAR-10 and Image Net datasets (Krizhevsky et al., 2009; Deng et al., 2009).
Dataset Splits Yes For each dataset, we use the backbone (Res Net-50) pre-trained on Image Net as an initialization for the feature extractor of the downstream classification model. In contrast to linear evaluation, we fine-tune the entire model on the given dataset for 20,000 iterations with the best hyperparameter setting selected on its validation split. Details of our hyperparameter selection are included in Appendix A.2. All models are pre-trained on Image Net for 100 epochs.
Hardware Specification No We train the model on CIFAR-10 with the LARS optimizer (You et al., 2019) and batch size 1024 for 1000 epochs (with 4 GPUs). On Image Net, we use the same optimizer and batch size of 256, and we train for 100 epochs (with 8 GPUs). This does not provide specific hardware models.
Software Dependencies No Our Pytorch implementation can be found on: https://github.com/Renkun Ni/Meta Contrastive This only mentions the framework, not specific versions.
Experiment Setup Yes We use a Res Net-18 backbone for all experiments on CIFAR-10 and Res Net50 for those on Image Net. We train the model on CIFAR-10 with the LARS optimizer (You et al., 2019) and batch size 1024 for 1000 epochs (with 4 GPUs). On Image Net, we use the same optimizer and batch size of 256, and we train for 100 epochs (with 8 GPUs). For Image Net pre-training, we follow the hyperparameter setting in Chen et al. (2020a), including baseline data augmentation methods, dimension of the latent space, and learning rate decay schedule. For CIFAR-10 pre-training, we use the same CIFAR-10 specific hyperparameters as Sim CLR again. For BYOL, we use the same learning rate schedule as meta-learners and start with learning rate 4. In addition, both the projector and predictor in BYOL are two-layer MLPs with hidden dimension 2048 and output dimension 256. More details can be found in Appendix A.1.