reproducibilityindex.ai

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

Authors: Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Comprehensive empirical evidence shows that our proposed methods lead to models that scale much better compared to the original BERT.
Researcher Affiliation	Collaboration	1Google Research 2Toyota Technological Institute at Chicago
Pseudocode	No	The paper describes the model architecture and techniques in text and tables but does not include any pseudocode or algorithm blocks.
Open Source Code	Yes	The code and the pretrained models are available at https://github.com/google-research/ALBERT.
Open Datasets	Yes	To keep the comparison as meaningful as possible, we follow the BERT (Devlin et al., 2019) setup in using the BOOKCORPUS (Zhu et al., 2015) and English Wikipedia (Devlin et al., 2019) for pretraining baseline models.
Dataset Splits	Yes	To monitor the training progress, we create a development set based on the development sets from SQu AD and RACE using the same procedure as in Sec. 4.1. We report accuracies for both MLM and sentence classiﬁcation tasks.
Hardware Specification	Yes	Training was done on Cloud TPU V3. The number of TPUs used for training ranged from 64 to 512, depending on model size.
Software Dependencies	No	The paper mentions tools like Sentence Piece and components like LAMB optimizer, but does not provide specific version numbers for any software dependencies required to replicate the experiment.
Experiment Setup	Yes	All the model updates use a batch size of 4096 and a LAMB optimizer with learning rate 0.00176 (You et al., 2019). We train all models for 125,000 steps unless otherwise speciﬁed. (Section 4.1) and "Hyperparameters for downstream tasks are shown in Table 14." (Appendix A.4).