reproducibilityindex.ai

SAS: Self-Augmentation Strategy for Language Model Pre-training

Authors: Yifei Xu, Jingqiao Zhang, Ru He, Liangzhu Ge, Chao Yang, Cheng Yang, Ying Nian Wu11586-11594

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments show that SAS outperforms ELECTRA and other state-of-the-art models in the GLUE tasks with similar or less computation cost.
Researcher Affiliation	Collaboration	Yifei Xu1, Jingqiao Zhang2, Ru He2, Liangzhu Ge2, Chao Yang2, Cheng Yang2 , Ying Nian Wu1 1 University of California, Los Angeles 2 Alibaba Group
Pseudocode	No	The paper describes the SAS framework and its workflow but does not include structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code and pretrained model is available publicly at Github: https://github.com/alibaba/self-augmentation-strategy.
Open Datasets	Yes	We use the same pretraining data as BERT, ELECTRA-Small and ELECTRA-Base, which consists of 3.3 Billion tokens from Wikipedia and Books Corpus datasets.
Dataset Splits	Yes	All GLUE scores are based on the Dev dataset.
Hardware Specification	Yes	With 1 V100 GPU, the pre-training of SASDA-Small takes 37.5h; both SAS-Small and SASc-Small takes about 24h; and ELECTRA-Small takes about 35h. The pre-training costs 7.7 days by 8 V100 GPUs.
Software Dependencies	Yes	Our implementation3 is based on Huggingface Transformers 4.3 framework (Wolf et al. 2020).
Experiment Setup	Yes	For ELECTRA-Small model as well as all other small models, we use batch size 512 and 0.25M pre-training steps, instead of batch size 128 and 1M steps in Clark et al. (2020b), and double the learning rate accordingly.