reproducibilityindex.ai

Distributed Deep Learning In Open Collaborations

Authors: Michael Diskin, Alexey Bukhtiyarov, Max Ryabinin, Lucile Saulnier, quentin lhoest, Anton Sinitsin, Dmitry Popov, Dmitry V. Pyrkin, Maxim Kashirin, Alexander Borzunov, Albert Villanova del Moral, Denis Mazur, Ilia Kobelev, Yacine Jernite, Thomas Wolf, Gennady Pekhimenko

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the effectiveness of our approach for Sw AV and ALBERT pretraining in realistic conditions and achieve performance comparable to traditional setups at a fraction of the cost.
Researcher Affiliation	Collaboration	Yandex, Russia Hugging Face, USA HSE University, Russia Moscow Institute of Physics and Technology, Russia University of Toronto, Canada Vector Institute, Canada
Pseudocode	No	While the paper describes algorithmic details and problem formulations (e.g., as a linear program), it does not include a formally labeled 'Pseudocode' or 'Algorithm' block with structured steps.
Open Source Code	Yes	Code and training conﬁgurations are available at github.com/yandex-research/De DLOC
Open Datasets	Yes	We train the Res Net-50 [93] model on the Image Net dataset [1] without labels. and pretrain the ALBERT-large [7] masked language model on the Wiki Text-103 dataset [97]. and trained the ALBERT-large model on Wikipedia and the Bengali part of the OSCAR [100] multilingual corpus.
Dataset Splits	No	The paper mentions dataset usage like 'Wiki Text-103 dataset' and 'Image Net dataset' but does not specify the explicit train/validation/test splits (e.g., percentages or exact counts) for reproducibility.
Hardware Specification	Yes	We train with three hardware setups: SERVER, WORKSTATION and HYBRID. The SERVER setup contains 8 workers, each with a single V100 GPU and 1 Gb/s symmetric bandwidth. In turn, the WORKSTATION setup consists of 16 nodes with 1080 Ti and 200 Mb/s bandwidth per worker. and We run all experiments on cloud instances with Tesla T4 GPUs.
Software Dependencies	No	The paper mentions using 'Hivemind [95]' and 'the transformers library [99]' but does not provide specific version numbers for these software dependencies, which is necessary for reproducibility.
Experiment Setup	Yes	Our experiments follow the recommended training conﬁguration [92, 94]: 2+6 random crops, early prototype freezing and a queue with 3,840 samples for each worker, LARS [78] optimizer, and 32,768 samples per batch across all workers.