Supervised Contrastive Learning for Pre-trained Language Model Fine-tuning
Authors: Beliz Gunel, Jingfei Du, Alexis Conneau, Veselin Stoyanov
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Combined with cross-entropy, our proposed SCL loss obtains significant improvements over a strong Ro BERTa-Large baseline on multiple datasets of the GLUE benchmark in few-shot learning settings... and We empirically demonstrate that the new objective has desirable properties across several different settings. and In Table 2, we report our few-shot learning results on SST-2, QNLI, and MNLI from the GLUE benchmark with 20, 100, 1000 labeled training examples. |
| Researcher Affiliation | Collaboration | Beliz Gunel , Jingfei Du , Alexis Conneau , Ves Stoyanov Stanford University, Facebook AI and Work done during Facebook AI research internship, correspondence to bgunel@stanford.edu. |
| Pseudocode | No | The paper defines mathematical equations for loss functions but does not include any explicitly labeled “Pseudocode” or “Algorithm” blocks. |
| Open Source Code | No | We use fairseq Ott et al. (2019) library and the open-source Ro BERTa-Large model for all of our experiments. This refers to tools used by the authors, not the release of their own method's code. There is no explicit statement or link indicating the source code for their supervised contrastive learning objective is available. |
| Open Datasets | Yes | We use datasets from the GLUE natural language understanding benchmark (Wang et al., 2019) for evaluation. |
| Dataset Splits | Yes | In our few-shot learning experiments, we sample half of the original validation set of the GLUE benchmark and use it as our test set, and sample 500 examples for our validation set from the original GLUE validation set, both taking the label distribution of the original validation set into account. and For full dataset experiments, such as the ones shown in Table 5, Table 6, Table 8, and Table 9, we sample a validation set from the original training set of the GLUE benchmark based on the size of the original validation set of GLUE, and report our test results on the original validation set of GLUE. |
| Hardware Specification | No | The paper does not specify the exact hardware used for experiments (e.g., GPU models, CPU types, or memory). It only states that fairseq and RoBERTa-Large were used, implying computational resources were involved but without detailing them. |
| Software Dependencies | No | We use fairseq Ott et al. (2019) library and the open-source Ro BERTa-Large model for all of our experiments. This mentions software by name but lacks specific version numbers for fairseq or other dependencies like Python, PyTorch/TensorFlow, or CUDA. |
| Experiment Setup | Yes | During all the fine-tuning runs, we use Adam optimizer with a learning rate of 1e-5, batch size of 16 (unless specified otherwise), and dropout rate of 0.1. For each experiment that includes the SCL term, we conduct a grid-based hyperparameter sweep for λ {0.1, 0.3, 0.5, 0.7, 0.9, 1.0} and τ {0.1, 0.3, 0.5, 0.7}. |