reproducibilityindex.ai

AutoSync: Learning to Synchronize for Data-Parallel Distributed Deep Learning

Authors: Hao Zhang, Yuan Li, Zhijie Deng, Xiaodan Liang, Lawrence Carin, Eric Xing

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate Auto Sync on a broad set of models and clusters, and show that there exist ample strategies in the proposed space that outperform hand-optimized systems by a signiﬁcant margin. Auto Sync can effectively ﬁnd strategies that reduce the training time by 1.2x 1.6x than hand-optimized ones on multiple, difﬁcult-to-parallelize model architectures (e.g. NCF [13], BERT [7] and VGG16 [30]), within an acceptable budget. We conduct experiments on two clusters (D): (1) Cluster A is an in-house cluster with 11 nodes, each equipped with a TITAN X GPU and 40Gb E Ethernet switch; (2) Cluster B is based on AWS, consists of 4x g4dn.12xlarge nodes, each with 4 NVIDIA T4 GPUs and 50Gb E full bandwidth.
Researcher Affiliation	Collaboration	1Petuum Inc., 2Carnegie Mellon University, 3Duke University, 4Tsinghua University
Pseudocode	No	The paper describes the proposed methods in narrative text and with mathematical formulas but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	1The data and code accompanying this paper are available at https://github.com/petuum/autodist.
Open Datasets	Yes	As an additional contribution, we collect a dataset containing nearly 10000 data points containing (model, resource, strategy) tuples and their corresponding runtime on real clusters. We share the dataset with the community to encourage extended studies.1 The data and code accompanying this paper are available at https://github.com/petuum/autodist.
Dataset Splits	No	The paper refers to training models with 'standard settings suggested by MLPerf [22]' but does not explicitly provide the specific training, validation, and test dataset splits in terms of percentages or sample counts for the deep learning models used in the experiments.
Hardware Specification	Yes	We conduct experiments on two clusters (D): (1) Cluster A is an in-house cluster with 11 nodes, each equipped with a TITAN X GPU and 40Gb E Ethernet switch; (2) Cluster B is based on AWS, consists of 4x g4dn.12xlarge nodes, each with 4 NVIDIA T4 GPUs and 50Gb E full bandwidth.
Software Dependencies	No	The paper mentions 'Tensor Flow 2.0' and 'NCCL version' but only provides a specific version number for TensorFlow. It does not list version numbers for other key software components or NCCL.
Experiment Setup	No	The paper mentions conducting synchronous training 'with standard settings suggested by MLPerf [22]' and specifies '10 warm-up iterations, then another 40 iterations of training' for runtime measurement. However, it does not explicitly list concrete hyperparameter values such as learning rate, batch size, or optimizer settings for the models.