Fast-ELECTRA for Efficient Pre-training
Authors: Chengyu Dong, Liyuan Liu, Hao Cheng, Jingbo Shang, Jianfeng Gao, Xiaodong Liu
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive experiments to test the effectiveness, efficiency, and robustness of our method. |
| Researcher Affiliation | Collaboration | Chengyu Dong1 Liyuan Liu2 Hao Cheng2 Jingbo Shang1 Jianfeng Gao2 Xiaodong Liu2 1University of California, San Diego 2Microsoft Research |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | No | The paper states that their method is 'implemented within the same codebase, which is built on top of FAIRSEQ', but it does not provide a direct link or explicit statement about releasing their specific source code. |
| Open Datasets | Yes | We employ Wikipedia and Book Corpus (Zhu et al., 2015) (16 GB of texts, 256M samples) for pre-training... We evaluate on GLUE (Wang et al., 2018) language understanding benchmark |
| Dataset Splits | Yes | Table 1: Results on GLUE development set. |
| Hardware Specification | Yes | We conduct pre-training on NVIDIA Tesla V100 with 32GB memory and fine-tuning on NVIDIA Tesla P100 with 16GB memory. ... including a node with 4 Ge Force RTX 3090 GPUs (24GB memory each, w/o NVLink), and a node with 8 Tesla V100 GPUs (32GB memory each, w/o NVLink). |
| Software Dependencies | No | The paper mentions that the method is built on top of 'FAIRSEQ', but it does not specify a version number for this or any other software dependency. |
| Experiment Setup | Yes | We conduct pre-training for 125K updates with a batch size of 2048. ... Detailed hyperparameter settings can be found in Appendix A. (Table 5 in Appendix A provides detailed hyperparameters for pre-training, including optimizer, learning rates, batch size, etc.) |