Distributed Deep Learning In Open Collaborations

Authors: Michael Diskin, Alexey Bukhtiyarov, Max Ryabinin, Lucile Saulnier, quentin lhoest, Anton Sinitsin, Dmitry Popov, Dmitry V. Pyrkin, Maxim Kashirin, Alexander Borzunov, Albert Villanova del Moral, Denis Mazur, Ilia Kobelev, Yacine Jernite, Thomas Wolf, Gennady Pekhimenko

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the effectiveness of our approach for Sw AV and ALBERT pretraining in realistic conditions and achieve performance comparable to traditional setups at a fraction of the cost.
Researcher Affiliation Collaboration Yandex, Russia Hugging Face, USA HSE University, Russia Moscow Institute of Physics and Technology, Russia University of Toronto, Canada Vector Institute, Canada
Pseudocode No While the paper describes algorithmic details and problem formulations (e.g., as a linear program), it does not include a formally labeled 'Pseudocode' or 'Algorithm' block with structured steps.
Open Source Code Yes Code and training configurations are available at github.com/yandex-research/De DLOC
Open Datasets Yes We train the Res Net-50 [93] model on the Image Net dataset [1] without labels. and pretrain the ALBERT-large [7] masked language model on the Wiki Text-103 dataset [97]. and trained the ALBERT-large model on Wikipedia and the Bengali part of the OSCAR [100] multilingual corpus.
Dataset Splits No The paper mentions dataset usage like 'Wiki Text-103 dataset' and 'Image Net dataset' but does not specify the explicit train/validation/test splits (e.g., percentages or exact counts) for reproducibility.
Hardware Specification Yes We train with three hardware setups: SERVER, WORKSTATION and HYBRID. The SERVER setup contains 8 workers, each with a single V100 GPU and 1 Gb/s symmetric bandwidth. In turn, the WORKSTATION setup consists of 16 nodes with 1080 Ti and 200 Mb/s bandwidth per worker. and We run all experiments on cloud instances with Tesla T4 GPUs.
Software Dependencies No The paper mentions using 'Hivemind [95]' and 'the transformers library [99]' but does not provide specific version numbers for these software dependencies, which is necessary for reproducibility.
Experiment Setup Yes Our experiments follow the recommended training configuration [92, 94]: 2+6 random crops, early prototype freezing and a queue with 3,840 samples for each worker, LARS [78] optimizer, and 32,768 samples per batch across all workers.