ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
Authors: Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Comprehensive empirical evidence shows that our proposed methods lead to models that scale much better compared to the original BERT. |
| Researcher Affiliation | Collaboration | 1Google Research 2Toyota Technological Institute at Chicago |
| Pseudocode | No | The paper describes the model architecture and techniques in text and tables but does not include any pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code and the pretrained models are available at https://github.com/google-research/ALBERT. |
| Open Datasets | Yes | To keep the comparison as meaningful as possible, we follow the BERT (Devlin et al., 2019) setup in using the BOOKCORPUS (Zhu et al., 2015) and English Wikipedia (Devlin et al., 2019) for pretraining baseline models. |
| Dataset Splits | Yes | To monitor the training progress, we create a development set based on the development sets from SQu AD and RACE using the same procedure as in Sec. 4.1. We report accuracies for both MLM and sentence classification tasks. |
| Hardware Specification | Yes | Training was done on Cloud TPU V3. The number of TPUs used for training ranged from 64 to 512, depending on model size. |
| Software Dependencies | No | The paper mentions tools like Sentence Piece and components like LAMB optimizer, but does not provide specific version numbers for any software dependencies required to replicate the experiment. |
| Experiment Setup | Yes | All the model updates use a batch size of 4096 and a LAMB optimizer with learning rate 0.00176 (You et al., 2019). We train all models for 125,000 steps unless otherwise specified. (Section 4.1) and "Hyperparameters for downstream tasks are shown in Table 14." (Appendix A.4). |