reproducibilityindex.ai

Ouroboros: On Accelerating Training of Transformer-Based Language Models

Authors: Qian Yang, Zhouyuan Huo, Wenlin Wang, Lawrence Carin

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on Transformer and Transformer-XL language models demonstrate that the proposed algorithm obtains a much faster speedup beyond data parallelism, with comparable or better accuracy.
Researcher Affiliation	Academia	1 Duke University 2 University of Pittsburgh
Pseudocode	Yes	Algorithm 1 Ouroboros + SGD
Open Source Code	Yes	Code to reproduce experiments is to be found at https://github. com/Lara Qian Yang/Ouroboros.
Open Datasets	Yes	(i) enwiki8, containing 100M bytes of unprocessed Wikipedia text [33]; (ii) text8, containing 100M processed lower-case Wikipedia characters and removing any character other than the 26 letters a through z, and space [33]; and (iii) Wiki Text-103, the largest available word-level language modeling benchmark with long-term dependency [34].
Dataset Splits	No	The paper mentions using training and test datasets but does not provide specific details on validation splits (percentages or counts) or explicitly describe a validation set setup.
Hardware Specification	Yes	All experiments are performed on a machine with 4 TESLA V100 GPUs.
Software Dependencies	No	The paper mentions 'PyTorch' and 'Python3' as software used but does not specify their version numbers.
Experiment Setup	Yes	According to [16], we use the Adam optimizer, where β1 = 0.9, β2 = 0.999 and ε = 1e 8 [28]. For comparison, we use Ouroboros+Adam (see Appendix) in the experiments. The learning rate is set to be 0.00025 and it decreases following a cosine learning rate schedule [35].