Fast-ELECTRA for Efficient Pre-training

Authors: Chengyu Dong, Liyuan Liu, Hao Cheng, Jingbo Shang, Jianfeng Gao, Xiaodong Liu

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive experiments to test the effectiveness, efficiency, and robustness of our method.
Researcher Affiliation Collaboration Chengyu Dong1 Liyuan Liu2 Hao Cheng2 Jingbo Shang1 Jianfeng Gao2 Xiaodong Liu2 1University of California, San Diego 2Microsoft Research
Pseudocode No The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code No The paper states that their method is 'implemented within the same codebase, which is built on top of FAIRSEQ', but it does not provide a direct link or explicit statement about releasing their specific source code.
Open Datasets Yes We employ Wikipedia and Book Corpus (Zhu et al., 2015) (16 GB of texts, 256M samples) for pre-training... We evaluate on GLUE (Wang et al., 2018) language understanding benchmark
Dataset Splits Yes Table 1: Results on GLUE development set.
Hardware Specification Yes We conduct pre-training on NVIDIA Tesla V100 with 32GB memory and fine-tuning on NVIDIA Tesla P100 with 16GB memory. ... including a node with 4 Ge Force RTX 3090 GPUs (24GB memory each, w/o NVLink), and a node with 8 Tesla V100 GPUs (32GB memory each, w/o NVLink).
Software Dependencies No The paper mentions that the method is built on top of 'FAIRSEQ', but it does not specify a version number for this or any other software dependency.
Experiment Setup Yes We conduct pre-training for 125K updates with a batch size of 2048. ... Detailed hyperparameter settings can be found in Appendix A. (Table 5 in Appendix A provides detailed hyperparameters for pre-training, including optimizer, learning rates, batch size, etc.)