reproducibilityindex.ai

Big Bird: Transformers for Longer Sequences

Authors: Manzil Zaheer, Guru Guruganesh, Kumar Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section our goal is to showcase beneﬁts of modeling longer input sequence for NLP tasks, for which we select three representative tasks. We begin with basic masked language modeling (MLM; Devlin et al. 22) to check if better contextual representations can be learnt by utilizing longer contiguous sequences.
Researcher Affiliation	Industry	Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed Google Research {manzilz, gurug, avinavadubey}@google.com
Pseudocode	No	No explicit pseudocode or algorithm blocks were found.
Open Source Code	Yes	1code available at http://goo.gle/bigbird-transformer
Open Datasets	Yes	Natural Questions [52]: For the given question, ﬁnd a short span of answer (SA) from the given evidences as well highlight the paragraph from the given evidences containing information about the correct answer (LA). Trivia QA-wiki [41]: We need to provide an answer for the given question using provided Wikipedia evidence, however, the answer might not be present in the given evidence. On a smaller veriﬁed subset of question, the given evidence is guaranteed to contain the answer. Nevertheless, we model the answer as span selection problem in this case as well. We learn contextual representation of these token on the human reference genome (GRCh37)3 using MLM objective.
Dataset Splits	No	No specific dataset split information (exact percentages, sample counts, or explicit citation to predefined validation splits) was found within the paper. While 'development set' is mentioned, its size or methodology for creation is not detailed.
Hardware Specification	No	No specific hardware details (exact GPU/CPU models, processor types with speeds, or detailed computer specifications) were provided for running the experiments. It only mentions '16GB memory/chip'.
Software Dependencies	No	No specific software dependencies with version numbers (e.g., library or framework names with versions like PyTorch 1.9) were provided.
Experiment Setup	Yes	Pretraining and MLM We follow [22, 63] to create base and large versions of BIGBIRD and pretrain it using MLM objective. We note that we trained our models on a reasonable 16GB memory/chip with batch size of 32-64. For a fair comparison, we had to use some additional regularization for training BIGBIRD, details of which are provided in App. E.2 along with exact architecture description.