reproducibilityindex.ai

Rethinking Embedding Coupling in Pre-trained Language Models

Authors: Hyung Won Chung, Thibault Fevry, Henry Tsai, Melvin Johnson, Sebastian Ruder

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	3 EXPERIMENTAL METHODOLOGY
Researcher Affiliation	Industry	Hyung Won Chung Google Research hwchung@google.com Thibault F evry thibaultfevry@gmail.com Henry Tsai Google Research henrytsai@google.com Melvin Johnson Google Research melvinp@google.com Sebastian Ruder Deep Mind ruder@google.com
Pseudocode	No	The paper does not include any sections, figures, or blocks explicitly labeled 'Pseudocode' or 'Algorithm'.
Open Source Code	Yes	We will release the pre-trained model checkpoint and the source code for Rem BERT in order to promote reproducibility and share the pre-training cost with other researchers.
Open Datasets	Yes	For our experiments, we employ tasks from the XTREME benchmark (Hu et al., 2020) that require ﬁne-tuning, including the XNLI (Conneau et al., 2018), NER (Pan et al., 2017), PAWS-X (Yang et al., 2019), XQu AD (Artetxe et al., 2020), MLQA (Lewis et al., 2020), and Ty Di QA-Gold P (Clark et al., 2020a) datasets. We train variants of this model that differ in certain hyper-parameters but otherwise are trained under the same conditions to ensure a fair comparison. The model is trained on Wikipedia dumps in 104 languages following Devlin et al. (2019) using masked language modeling (MLM).
Dataset Splits	Yes	We average results across three ﬁne-tuning runs and evaluate on the dev sets unless otherwise stated. We show statistics for them in Table 11. Table 11: Statistics for the datasets in XTREME, including the number of training, development, and test examples as well as the number of languages for each task.
Hardware Specification	Yes	For all pre-training except for the large scale Rem BERT, we trained using 64 Google Cloud TPUs. All ﬁne-tuning experiments were run on 8 Cloud TPUs.
Software Dependencies	No	The paper mentions using 'the Sentence Piece tokenizer (Kudo & Richardson, 2018)' but does not provide specific version numbers for software libraries or dependencies like Python, PyTorch, TensorFlow, etc., beyond the citation for SentencePiece.
Experiment Setup	Yes	For all ﬁne-tuning experiments other than Rem BERT, we use batch size of 32. We sweep over the learning rate values speciﬁed in Table 10. Table 10: Fine-tuning hyperparameters for all models except Rem BERT. Table 14: Hyperparameters for Rem BERT architecture and pre-training. Table 15: Hyperparameters for Rem BERT ﬁne-tuning.