reproducibilityindex.ai

Understand and Modularize Generator Optimization in ELECTRA-style Pretraining

Authors: Chengyu Dong, Liyuan Liu, Hao Cheng, Jingbo Shang, Jianfeng Gao, Xiaodong Liu

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct experiments with the standard BERTbase and BERTlarge (Devlin et al., 2019) pre-training setting on the GLUE (Wang et al., 2018) benchmark, and our simple technique consistently outperforms the original ELECTRA design and alternative pre-training specifications that are more recently proposed. (Section 1, Introduction); 6. Experiments
Researcher Affiliation	Collaboration	Chengyu Dong 1 2 Liyuan Liu 3 Hao Cheng 3 Jingbo Shang 1 Jianfeng Gao 3 Xiaodong Liu 3 1University of California, San Diego 2Work was done during an internship at Microsoft. 3Microsoft Research.
Pseudocode	No	The paper describes methods and processes but does not include structured pseudocode or algorithm blocks (e.g., Algorithm 1, Pseudocode).
Open Source Code	Yes	Our source code is publicly available 2. 2https://github.com/namisan/Decoupled Optim (Section 1, Introduction)
Open Datasets	Yes	Specifically, we employ Wikipedia and Book Corpus (Zhu et al., 2015) (16 GB of texts, 256M samples) for pretraining with sequence length as 512. (Section 6.1, Pretraining Setup)
Dataset Splits	Yes	We conduct the evaluation on downstream tasks following the setup in previous works (Meng et al., 2021; Bajaj et al., 2022). Specifically, we evaluate on GLUE (Wang et al., 2018) language understanding benchmark with a single-task, single-model finetuning setting following previous works. (Section 6.1, Downstream evaluation setup)
Hardware Specification	Yes	We conduct pretraining on NVIDIA Tesla V100 with 32GB memory and fine-tuning on NVIDIA Tesla P100 with 16GB memory. (Appendix C, A Roadmap to Hyperparameter Tuning)
Software Dependencies	No	The paper mentions 'FAIRSEQ' but does not specify its version or other software dependencies with version numbers.
Experiment Setup	Yes	Table 2. Hyperparameter settings used in pretraining. Max Steps 125K, Optimizer Adam, Peak Learning Rate (Generator) 2 10 4, Peak Learning Rate (Discriminator) 1.5 10 3, Batch Size 2048, Warm-Up Steps 10K, Sequence Length 512.