reproducibilityindex.ai

On the Transferability of Pre-trained Language Models: A Study from Artificial Datasets

Authors: Cheng-Han Chiang, Hung-yi Lee10518-10525

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	By ﬁne-tuning the pre-trained models on GLUE benchmark, we can learn how beneﬁcial it is to transfer the knowledge from the model trained on the dataset possessing that speciﬁc trait. Our experiments show that the explicit dependencies in the sequences of the pre-training data are critical to the downstream performance. Our results also reveal that models achieve better downstream performance when pre-trained on a dataset with a longer range of implicit dependencies.
Researcher Affiliation	Academia	Cheng-Han Chiang, Hung-yi Lee National Taiwan University, Taiwan dcml0714@gmail.com, hungyilee@ntu.edu.tw
Pseudocode	No	The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any explicit statement or link regarding the availability of its source code.
Open Datasets	Yes	We adopt the GLUE (Wang et al. 2019; Socher et al. 2013; Dolan and Brockett 2005; Cer et al. 2017; Williams, Nangia, and Bowman 2018; Rajpurkar et al. 2016) benchmarks to evaluate the models pre-trained on different L1s. We pre-train a Ro BERTa-medium using a subset of English Wikipedia. We pre-train a Ro BERTa-medium using Kannada from OSCAR dataset (Su arez, Romary, and Sagot 2020). The sentences used for computing the distribution of j here are from SQu AD (Rajpurkar et al. 2016). The Quora Question Pairs (QQP) (Iyer, Dandekar, and Csernai 2017).
Dataset Splits	No	The paper mentions using "the evaluation set" and "original GLUE training set" but does not specify explicit percentages or sample counts for training, validation, or test splits. While GLUE has standard splits, the paper does not explicitly state them for reproduction.
Hardware Specification	Yes	The whole process, from stage 1 to stage 3, takes three days on a single V100 GPU.
Software Dependencies	No	The paper mentions using RoBERTa (Liu et al. 2019) and Byte Pair Encoding (BPE) but does not specify version numbers for any software libraries, programming languages, or other dependencies.
Experiment Setup	Yes	We use a speciﬁc set of hyperparameters and three different random seeds to ﬁne-tune the model for each task. We report the average and standard deviation over different seeds of the results on the evaluation set.