A Quadratic Synchronization Rule for Distributed Deep Learning

Authors: Xinran Gu, Kaifeng Lyu, Sanjeev Arora, Jingzhao Zhang, Longbo Huang

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive Image Net experiments on Res Net and Vi T show that local gradient methods with QSR consistently improve the test accuracy over other synchronization strategies.
Researcher Affiliation Collaboration Xinran Gu1 Kaifeng Lyu4 Sanjeev Arora4 Jingzhao Zhang1,2,3 Longbo Huang1 1Institute for Interdisciplinary Information Sciences, Tsinghua University 2Shanghai Qizhi Institute 3Shanghai AI Laboratory 4Department of Computer Science & Princeton Language and Intelligence, Princeton University
Pseudocode Yes B PSEUDOCODE We present the pseudocodes for standard data parallel methods and local gradient methods below. Algorithm 1: Parallel OPT: Data Parallel Methods on K Workers ... Algorithm 2: Local OPT: Local Gradient Methods on K Workers
Open Source Code No The paper does not contain any explicit statements about releasing code or links to repositories for their own methodology.
Open Datasets Yes Extensive Image Net experiments on Res Net and Vi T show that local gradient methods with QSR consistently improve the test accuracy over other synchronization strategies.
Dataset Splits No The paper mentions 'top-1 validation accuracy' and 'Val. acc. (%)' in tables, and discusses tuning hyperparameters, implying the use of a validation set. However, it does not provide explicit details about the dataset splits (e.g., percentages or sample counts for training, validation, and test sets).
Hardware Specification Yes We evaluate the communication efficiency of QSR on a 64-GPU NVIDIA Ge Force RTX 3090 cluster.
Software Dependencies No The paper states: 'We use Pytorch Distributed with NCCL backend to support multinode distributed training and use FFCV (Leclerc et al., 2022) to accelerate data loading of Image Net.' However, specific version numbers for Pytorch, NCCL, or FFCV are not provided.
Experiment Setup Yes C EXPERIMENTAL DETAILS This section lists the additional experimental details omitted in the main text. ... C.1 TRAINING DETAILS FOR RESNET-152 We generally follow the recipe in Foret et al. (2021b) to train Res Net-152. Specifically, we set the momentum as 0.9 and the weight decay λ as 0.0001. ... We adopt a local batch size Bloc = 256...