reproducibilityindex.ai

Supervised Contrastive Learning for Pre-trained Language Model Fine-tuning

Authors: Beliz Gunel, Jingfei Du, Alexis Conneau, Veselin Stoyanov

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Combined with cross-entropy, our proposed SCL loss obtains signiﬁcant improvements over a strong Ro BERTa-Large baseline on multiple datasets of the GLUE benchmark in few-shot learning settings... and We empirically demonstrate that the new objective has desirable properties across several different settings. and In Table 2, we report our few-shot learning results on SST-2, QNLI, and MNLI from the GLUE benchmark with 20, 100, 1000 labeled training examples.
Researcher Affiliation	Collaboration	Beliz Gunel , Jingfei Du , Alexis Conneau , Ves Stoyanov Stanford University, Facebook AI and Work done during Facebook AI research internship, correspondence to bgunel@stanford.edu.
Pseudocode	No	The paper defines mathematical equations for loss functions but does not include any explicitly labeled “Pseudocode” or “Algorithm” blocks.
Open Source Code	No	We use fairseq Ott et al. (2019) library and the open-source Ro BERTa-Large model for all of our experiments. This refers to tools used by the authors, not the release of their own method's code. There is no explicit statement or link indicating the source code for their supervised contrastive learning objective is available.
Open Datasets	Yes	We use datasets from the GLUE natural language understanding benchmark (Wang et al., 2019) for evaluation.
Dataset Splits	Yes	In our few-shot learning experiments, we sample half of the original validation set of the GLUE benchmark and use it as our test set, and sample 500 examples for our validation set from the original GLUE validation set, both taking the label distribution of the original validation set into account. and For full dataset experiments, such as the ones shown in Table 5, Table 6, Table 8, and Table 9, we sample a validation set from the original training set of the GLUE benchmark based on the size of the original validation set of GLUE, and report our test results on the original validation set of GLUE.
Hardware Specification	No	The paper does not specify the exact hardware used for experiments (e.g., GPU models, CPU types, or memory). It only states that fairseq and RoBERTa-Large were used, implying computational resources were involved but without detailing them.
Software Dependencies	No	We use fairseq Ott et al. (2019) library and the open-source Ro BERTa-Large model for all of our experiments. This mentions software by name but lacks specific version numbers for fairseq or other dependencies like Python, PyTorch/TensorFlow, or CUDA.
Experiment Setup	Yes	During all the ﬁne-tuning runs, we use Adam optimizer with a learning rate of 1e-5, batch size of 16 (unless speciﬁed otherwise), and dropout rate of 0.1. For each experiment that includes the SCL term, we conduct a grid-based hyperparameter sweep for λ {0.1, 0.3, 0.5, 0.7, 0.9, 1.0} and τ {0.1, 0.3, 0.5, 0.7}.