reproducibilityindex.ai

Fast-ELECTRA for Efficient Pre-training

Authors: Chengyu Dong, Liyuan Liu, Hao Cheng, Jingbo Shang, Jianfeng Gao, Xiaodong Liu

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct extensive experiments to test the effectiveness, efficiency, and robustness of our method.
Researcher Affiliation	Collaboration	Chengyu Dong1 Liyuan Liu2 Hao Cheng2 Jingbo Shang1 Jianfeng Gao2 Xiaodong Liu2 1University of California, San Diego 2Microsoft Research
Pseudocode	No	The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code	No	The paper states that their method is 'implemented within the same codebase, which is built on top of FAIRSEQ', but it does not provide a direct link or explicit statement about releasing their specific source code.
Open Datasets	Yes	We employ Wikipedia and Book Corpus (Zhu et al., 2015) (16 GB of texts, 256M samples) for pre-training... We evaluate on GLUE (Wang et al., 2018) language understanding benchmark
Dataset Splits	Yes	Table 1: Results on GLUE development set.
Hardware Specification	Yes	We conduct pre-training on NVIDIA Tesla V100 with 32GB memory and fine-tuning on NVIDIA Tesla P100 with 16GB memory. ... including a node with 4 Ge Force RTX 3090 GPUs (24GB memory each, w/o NVLink), and a node with 8 Tesla V100 GPUs (32GB memory each, w/o NVLink).
Software Dependencies	No	The paper mentions that the method is built on top of 'FAIRSEQ', but it does not specify a version number for this or any other software dependency.
Experiment Setup	Yes	We conduct pre-training for 125K updates with a batch size of 2048. ... Detailed hyperparameter settings can be found in Appendix A. (Table 5 in Appendix A provides detailed hyperparameters for pre-training, including optimizer, learning rates, batch size, etc.)