reproducibilityindex.ai

Decentralized SGD and Average-direction SAM are Asymptotically Equivalent

Authors: Tongtian Zhu, Fengxiang He, Kaixuan Chen, Mingli Song, Dacheng Tao

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments support our theory and the code is available at D-SGD and SAM. Our empirical results also fully support our theory (see Figure 1 and Figure 3).
Researcher Affiliation	Collaboration	1College of Computer Science and Technology, Zhejiang University 2JD Explore Academy, JD.com, Inc. 3Artificial Intelligence and its Applications Institute, School of Informatics, University of Edinburgh 4The University of Sydney.
Pseudocode	No	The paper does not contain any sections or figures explicitly labeled "Pseudocode" or "Algorithm", nor does it include structured code-like blocks outlining a procedure.
Open Source Code	Yes	Experiments support our theory and the code is available at D-SGD and SAM. (Abstract) and Code is available at D-SGD and SAM. (Section 5)
Open Datasets	Yes	D-SGD with various commonly used topologies... and C-SGD are employed to train image classifiers on CIFAR-10 (Krizhevsky et al., 2009) and Tiny Image Net (Le & Yang, 2015)...
Dataset Splits	No	The paper mentions using CIFAR-10 and Tiny Image Net datasets and discusses "validation accuracy", but it does not explicitly state the train/validation/test split percentages or methodology for reproducibility.
Hardware Specification	Yes	The experiments are conducted on a computing facility with NVIDIA Tesla V100 16GB GPUs and Intel Xeon Gold 6140 CPU @ 2.30GHz CPUs.
Software Dependencies	No	The code is based on Py Torch (Paszke et al., 2019). While PyTorch is mentioned, a specific version number (e.g., 1.9, 2.0) is not provided, which is necessary for full reproducibility.
Experiment Setup	Yes	The number of workers (one GPU as a worker) is set as 16; and the local batch size is set as 8, 64, and 512 per worker in different cases. For the case of local batch size 64, the initial learning rate is set as 0.1 for Res Net-18 and Res Net-34 and 0.01 for Alex Net... The learning rate is divided by 10 when the model has passed the 2/5 and 4/5 of the total number of iterations (He et al., 2016a). We apply the learning rate warm-up (Smith, 2017) and the linear scaling law (He et al., 2016a; Goyal et al., 2017)... Batch normalization (Ioffe & Szegedy, 2015) is employed in training Alex Net.