reproducibilityindex.ai

GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism

Authors: Yanping Huang, Youlong Cheng, Ankur Bapna, Orhan Firat, Dehao Chen, Mia Chen, HyoukJoong Lee, Jiquan Ngiam, Quoc V. Le, Yonghui Wu, zhifeng Chen

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the advantages of GPipe by training large-scale neural networks on two different tasks with distinct network architectures: (i) Image Classiﬁcation: We train a 557-million-parameter Amoeba Net model and attain a top-1 accuracy of 84.4% on Image Net-2012, (ii) Multilingual Neural Machine Translation: We train a single 6-billion-parameter, 128-layer Transformer model on a corpus spanning over 100 languages and achieve better quality than all bilingual models.
Researcher Affiliation	Industry	{huangyp,ylc,ankurbpn,orhanf,miachen,dehao hyouklee,jngiam,qvl,yonghui,zhifengc}@google.com
Pseudocode	No	The paper describes the algorithm using text and a diagram (Figure 2c), but it does not include explicitly labeled pseudocode or an algorithm block in the main text. It refers to supplementary material for examples, which may contain pseudocode, but this is not present in the main paper.
Open Source Code	No	The paper states 'This open-source library is implemented under the Lingvo [16] framework.' This indicates that GPipe is built within an open-source framework, but it does not explicitly state that the specific code for GPipe as described and evaluated in this paper is publicly released or provide a link to it.
Open Datasets	Yes	Image Net-2012 dataset; CIFAR-10, CIFAR-100, Stanford Cars, Oxford Pets, Food-101, FGVC Aircraft, Birdsnap (all in Table 5); We use a corpus of parallel documents over 102 languages and English, containing a total of 25 billion training examples... [37].
Dataset Splits	Yes	We train a 557-million-parameter Amoeba Net model and attain a top-1 accuracy of 84.4% on Image Net-2012; 84.4% top-1 accuracy for the 550M parameter Amoeba Net model; top-1 validation accuracy of 84.4%.
Hardware Specification	Yes	We ran the experiments on Cloud TPUv2s with 8GB memory per accelerator. We next trained Transformer models using Cloud TPUv3s with 16GB memory per accelerator core. we ran our experiments on a single host with multiple NVIDIA P100 GPUs.
Software Dependencies	No	The paper states 'This open-source library is implemented under the Lingvo [16] framework.' While Lingvo is mentioned, no specific version numbers are provided for Lingvo or any other software dependencies.
Experiment Setup	Yes	We used a ﬁxed input image size of 224 224 and mini-batch size of 128. We used a ﬁxed vocabulary size of 32k, sequence length 1024 and batch size 32. Each Transformer layer has 2048 for model dimension, 8192 for feed-forward hidden dimension and 32 attention heads. The number of micro-batches was ﬁxed at 32. Input images to the network during training were resized to 480 480, horizontally ﬂipped randomly and augmented using cutout [24]. We clip the logit predictions (softmax pre-activations) whenever their magnitude exceeds a certain value.