A Quadratic Synchronization Rule for Distributed Deep Learning
Authors: Xinran Gu, Kaifeng Lyu, Sanjeev Arora, Jingzhao Zhang, Longbo Huang
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive Image Net experiments on Res Net and Vi T show that local gradient methods with QSR consistently improve the test accuracy over other synchronization strategies. |
| Researcher Affiliation | Collaboration | Xinran Gu1 Kaifeng Lyu4 Sanjeev Arora4 Jingzhao Zhang1,2,3 Longbo Huang1 1Institute for Interdisciplinary Information Sciences, Tsinghua University 2Shanghai Qizhi Institute 3Shanghai AI Laboratory 4Department of Computer Science & Princeton Language and Intelligence, Princeton University |
| Pseudocode | Yes | B PSEUDOCODE We present the pseudocodes for standard data parallel methods and local gradient methods below. Algorithm 1: Parallel OPT: Data Parallel Methods on K Workers ... Algorithm 2: Local OPT: Local Gradient Methods on K Workers |
| Open Source Code | No | The paper does not contain any explicit statements about releasing code or links to repositories for their own methodology. |
| Open Datasets | Yes | Extensive Image Net experiments on Res Net and Vi T show that local gradient methods with QSR consistently improve the test accuracy over other synchronization strategies. |
| Dataset Splits | No | The paper mentions 'top-1 validation accuracy' and 'Val. acc. (%)' in tables, and discusses tuning hyperparameters, implying the use of a validation set. However, it does not provide explicit details about the dataset splits (e.g., percentages or sample counts for training, validation, and test sets). |
| Hardware Specification | Yes | We evaluate the communication efficiency of QSR on a 64-GPU NVIDIA Ge Force RTX 3090 cluster. |
| Software Dependencies | No | The paper states: 'We use Pytorch Distributed with NCCL backend to support multinode distributed training and use FFCV (Leclerc et al., 2022) to accelerate data loading of Image Net.' However, specific version numbers for Pytorch, NCCL, or FFCV are not provided. |
| Experiment Setup | Yes | C EXPERIMENTAL DETAILS This section lists the additional experimental details omitted in the main text. ... C.1 TRAINING DETAILS FOR RESNET-152 We generally follow the recipe in Foret et al. (2021b) to train Res Net-152. Specifically, we set the momentum as 0.9 and the weight decay λ as 0.0001. ... We adopt a local batch size Bloc = 256... |